reproducibilityindex.ai

Active Covering

Authors: Heinrich Jiang, Afshin Rostamizadeh

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we show that the active learning method consistently outperforms ofﬂine methods as well as a variety of baselines on a wide range of benchmark image-based datasets. In Section 6, we show empirical results on a wide range of benchmark image-based datasets (Letters, MNIST, Fashion MNIST, CIFAR10, Celeb A) comparing the Explore-then-Commit algorithm to a number of ofﬂine and active baselines.
Researcher Affiliation	Industry	Heinrich Jiang 1 Afshin Rostamizadeh 1 1Google Research. Correspondence to: Heinrich Jiang <heinrichj@google.com>.
Pseudocode	Yes	Algorithm 1 Ofﬂine Learner; Algorithm 2 Active Explore-then-Commit Learner
Open Source Code	No	The paper does not provide any statements about open-source code availability or links to code repositories.
Open Datasets	Yes	1: UCI Letters Recognition (Dua & Graff, 2017)... 2: MNIST... 3: Fashion MNIST (Xiao et al., 2017)... 4: CIFAR10... 5: SVHN (Netzer et al., 2011)... 6. Celeb A (Liu et al., 2018)
Dataset Splits	Yes	For these methods, we perform 5-fold crossvalidation on the initial sample using accuracy as the metric (these methods as implemented in scikit-learn have predict methods which classiﬁes whether an example is an outlier relative to the positive class).
Hardware Specification	Yes	averaged across 100 runs randomizing over different initial samples and ran on a cluster of NVIDIATM Tesla TM V100 Tensor Core GPUs.
Software Dependencies	No	The paper mentions 'one-class classiﬁcation methods implemented in scikit-learn', but does not provide specific version numbers for scikit-learn or any other software dependencies.
Experiment Setup	Yes	We ﬁx the initial sample size to a random stratiﬁed sample of 100 datapoints. We train a neural network on the initial sample and use the activations of the second-last layer... we let the batch size be 5% of the remainder of the dataset... For the SVM methods, we tune the gamma parameter (kernel coefﬁcient) and nu... For Isolation Forest, we tune the number of estimators... For Robust Covariance, we tune the proportion of contamination... For all of these aforementioned hyperparameters, we search over a grid of powers of two.