Finding the Homology of Decision Boundaries with Active Learning
Authors: Weizhi Li, Gautam Dasarathy, Karthikeyan Natesan Ramamurthy, Visar Berisha
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We theoretically analyze the proposed framework and show that the query complexity of our active learning algorithm depends naturally on the intrinsic complexity of the underlying manifold. We demonstrate the effectiveness of our framework in selecting best-performing machine learning models for datasets just using their respective homological summaries. Experiments on several standard datasets show the sample complexity improvement in recovering the homology and demonstrate the practical utility of the framework for model selection. |
| Researcher Affiliation | Collaboration | Weizhi Li Arizona State University weizhili@asu.edu Gautam Dasarathy Arizona State University gautamd@asu.edu Karthikeyan Natesan Ramamurthy IBM Research knatesa@us.ibm.com Visar Berisha Arizona State University visar@asu.edu |
| Pseudocode | No | The paper describes the active learning algorithm through text and a schematic diagram (Figure 2), but it does not include a formally structured pseudocode block or algorithm listing. |
| Open Source Code | Yes | Source code for our algorithms and experimental results is available at https://github. com/wayne0908/Active-Learning-Homology. |
| Open Datasets | Yes | We implement the above procedure and evaluate on Banknote [20], MNIST [21] and CIFAR10 [22] datasets. |
| Dataset Splits | No | For the Banknote dataset, it mentions using 'the remaining data as both the test set and unlabelled data pool' after sampling 100 examples for training, which is not a precise or generalizable split. For MNIST and CIFAR10, it states 'a training set with a sample size of 200, a test set with a sample size of 2000, and an unlabelled data pool with a sample size of 2000' but does not specify a distinct validation set or the overall precise partitioning of the full datasets. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper mentions using 'the python wrapper of the Ripser package' and cites 'Scikit-tda' but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | No | The paper describes the *types* of classifiers used and their *ranges* of parameters (e.g., 'k-NN with k ranging from 1 to 29'), but does not provide concrete hyperparameter values, optimizer settings, or other system-level training configurations for these models or their own framework. |