Finding the Homology of Decision Boundaries with Active Learning

Authors: Weizhi Li, Gautam Dasarathy, Karthikeyan Natesan Ramamurthy, Visar Berisha

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We theoretically analyze the proposed framework and show that the query complexity of our active learning algorithm depends naturally on the intrinsic complexity of the underlying manifold. We demonstrate the effectiveness of our framework in selecting best-performing machine learning models for datasets just using their respective homological summaries. Experiments on several standard datasets show the sample complexity improvement in recovering the homology and demonstrate the practical utility of the framework for model selection.
Researcher Affiliation Collaboration Weizhi Li Arizona State University weizhili@asu.edu Gautam Dasarathy Arizona State University gautamd@asu.edu Karthikeyan Natesan Ramamurthy IBM Research knatesa@us.ibm.com Visar Berisha Arizona State University visar@asu.edu
Pseudocode No The paper describes the active learning algorithm through text and a schematic diagram (Figure 2), but it does not include a formally structured pseudocode block or algorithm listing.
Open Source Code Yes Source code for our algorithms and experimental results is available at https://github. com/wayne0908/Active-Learning-Homology.
Open Datasets Yes We implement the above procedure and evaluate on Banknote [20], MNIST [21] and CIFAR10 [22] datasets.
Dataset Splits No For the Banknote dataset, it mentions using 'the remaining data as both the test set and unlabelled data pool' after sampling 100 examples for training, which is not a precise or generalizable split. For MNIST and CIFAR10, it states 'a training set with a sample size of 200, a test set with a sample size of 2000, and an unlabelled data pool with a sample size of 2000' but does not specify a distinct validation set or the overall precise partitioning of the full datasets.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies No The paper mentions using 'the python wrapper of the Ripser package' and cites 'Scikit-tda' but does not provide specific version numbers for these or other software dependencies.
Experiment Setup No The paper describes the *types* of classifiers used and their *ranges* of parameters (e.g., 'k-NN with k ranging from 1 to 29'), but does not provide concrete hyperparameter values, optimizer settings, or other system-level training configurations for these models or their own framework.