Topological Data Analysis of Decision Boundaries with Application to Model Selection
Authors: Karthikeyan Natesan Ramamurthy, Kush Varshney, Krishnan Mody
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our main objective is quantification of deep neural network complexity to enable matching of datasets to pre-trained models to facilitate the functioning of AI marketplaces; we report results for experiments using MNIST, Fashion MNIST, and CIFAR10. We perform experiments with synthetic and high-dimensional real-world datasets to demonstrate: (a) the effectiveness of our approach in recovering homology groups accurately, and (b) the utility of this method in discovering the Decision Boundary Topological Complexity (DBTC) of neural networks and their potential use in choosing pre-trained models for a new dataset. |
| Researcher Affiliation | Collaboration | 1IBM Research, Yorktown Heights, NY, USA 2Courant Institute, New York University, New York City, NY, USA. |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Implementations of the approaches proposed in this work are available at: https://github. com/nrkarthikeyan/topology-decisionboundaries |
| Open Datasets | Yes | We consider three application domains for our evaluation: MNIST (Lecun & Cortes, 2009), Fashion MNIST (Xiao et al., 2017) and CIFAR10 (Krizhevsky & Hinton, 2009). All three applications have 10 classes and 50, 000 training and 10, 000 test images. |
| Dataset Splits | No | The paper states the number of training and test images ('50,000 training and 10,000 test images') but does not specify a separate validation split or its size. |
| Hardware Specification | No | The paper mentions that 'The program runs in a single core using less than 500MB of RAM in a standard computer.' This description is too general and lacks specific details such as CPU/GPU models, processor types, or exact memory amounts to be reproducible. |
| Software Dependencies | No | The paper mentions using 'the efficient Ripser package (Bauer, 2016) and its Python interface (Nathaniel Saul, 2019)' but does not provide specific version numbers for these software components, which are required for reproducibility. |
| Experiment Setup | Yes | In all experiments, to limit the number of simplices, we upper bound the number of neighbors used to compute the neighborhood graph to 20. We construct 10 * 2 = 45 binary classification datasets from each application using the standard CNN architecture provided in https://github.com/pytorch/ examples/tree/master/mnist for MNIST and Fashion MNIST, and the VGG CNN configuration D for CIFAR10 (Simonyan & Zisserman, 2014). |