reproducibilityindex.ai

Finding the Homology of Decision Boundaries with Active Learning

Authors: Weizhi Li, Gautam Dasarathy, Karthikeyan Natesan Ramamurthy, Visar Berisha

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We theoretically analyze the proposed framework and show that the query complexity of our active learning algorithm depends naturally on the intrinsic complexity of the underlying manifold. We demonstrate the effectiveness of our framework in selecting best-performing machine learning models for datasets just using their respective homological summaries. Experiments on several standard datasets show the sample complexity improvement in recovering the homology and demonstrate the practical utility of the framework for model selection.
Researcher Affiliation	Collaboration	Weizhi Li Arizona State University weizhili@asu.edu Gautam Dasarathy Arizona State University gautamd@asu.edu Karthikeyan Natesan Ramamurthy IBM Research knatesa@us.ibm.com Visar Berisha Arizona State University visar@asu.edu
Pseudocode	No	The paper describes the active learning algorithm through text and a schematic diagram (Figure 2), but it does not include a formally structured pseudocode block or algorithm listing.
Open Source Code	Yes	Source code for our algorithms and experimental results is available at https://github. com/wayne0908/Active-Learning-Homology.
Open Datasets	Yes	We implement the above procedure and evaluate on Banknote [20], MNIST [21] and CIFAR10 [22] datasets.
Dataset Splits	No	For the Banknote dataset, it mentions using 'the remaining data as both the test set and unlabelled data pool' after sampling 100 examples for training, which is not a precise or generalizable split. For MNIST and CIFAR10, it states 'a training set with a sample size of 200, a test set with a sample size of 2000, and an unlabelled data pool with a sample size of 2000' but does not specify a distinct validation set or the overall precise partitioning of the full datasets.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies	No	The paper mentions using 'the python wrapper of the Ripser package' and cites 'Scikit-tda' but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	No	The paper describes the types of classifiers used and their ranges of parameters (e.g., 'k-NN with k ranging from 1 to 29'), but does not provide concrete hyperparameter values, optimizer settings, or other system-level training configurations for these models or their own framework.