reproducibilityindex.ai

Topological Data Analysis of Decision Boundaries with Application to Model Selection

Authors: Karthikeyan Natesan Ramamurthy, Kush Varshney, Krishnan Mody

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our main objective is quantification of deep neural network complexity to enable matching of datasets to pre-trained models to facilitate the functioning of AI marketplaces; we report results for experiments using MNIST, Fashion MNIST, and CIFAR10. We perform experiments with synthetic and high-dimensional real-world datasets to demonstrate: (a) the effectiveness of our approach in recovering homology groups accurately, and (b) the utility of this method in discovering the Decision Boundary Topological Complexity (DBTC) of neural networks and their potential use in choosing pre-trained models for a new dataset.
Researcher Affiliation	Collaboration	1IBM Research, Yorktown Heights, NY, USA 2Courant Institute, New York University, New York City, NY, USA.
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Implementations of the approaches proposed in this work are available at: https://github. com/nrkarthikeyan/topology-decisionboundaries
Open Datasets	Yes	We consider three application domains for our evaluation: MNIST (Lecun & Cortes, 2009), Fashion MNIST (Xiao et al., 2017) and CIFAR10 (Krizhevsky & Hinton, 2009). All three applications have 10 classes and 50, 000 training and 10, 000 test images.
Dataset Splits	No	The paper states the number of training and test images ('50,000 training and 10,000 test images') but does not specify a separate validation split or its size.
Hardware Specification	No	The paper mentions that 'The program runs in a single core using less than 500MB of RAM in a standard computer.' This description is too general and lacks specific details such as CPU/GPU models, processor types, or exact memory amounts to be reproducible.
Software Dependencies	No	The paper mentions using 'the efﬁcient Ripser package (Bauer, 2016) and its Python interface (Nathaniel Saul, 2019)' but does not provide specific version numbers for these software components, which are required for reproducibility.
Experiment Setup	Yes	In all experiments, to limit the number of simplices, we upper bound the number of neighbors used to compute the neighborhood graph to 20. We construct 10 * 2 = 45 binary classiﬁcation datasets from each application using the standard CNN architecture provided in https://github.com/pytorch/ examples/tree/master/mnist for MNIST and Fashion MNIST, and the VGG CNN conﬁguration D for CIFAR10 (Simonyan & Zisserman, 2014).