reproducibilityindex.ai

Efficient and Parsimonious Agnostic Active Learning

Authors: Tzu-Kuo Huang, Alekh Agarwal, Daniel J. Hsu, John Langford, Robert E. Schapire

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct an extensive study in Section 5 by simulating the interaction of the active learning algorithm with a streaming supervised dataset. Results on a wide array of datasets show that agnostic active learning typically outperforms passive learning, and the magnitude of improvement depends on how carefully the active learning hyper-parameters are chosen.
Researcher Affiliation	Collaboration	Tzu-Kuo Huang Microsoft Research, NYC tkhuang@microsoft.com Alekh Agarwal Microsoft Research, NYC alekha@microsoft.com Daniel Hsu Columbia University djhsu@cs.columbia.edu John Langford Microsoft Research, NYC jcl@microsoft.com Robert E. Schapire Microsoft Research, NYC schapire@microsoft.com
Pseudocode	Yes	Algorithm 1 ACTIVE COVER (AC) input: Constants c1, c2, c3, conﬁdence δ, error radius γ, parameters α, β, ξ for (OP), epoch schedule 0 = τ0 < 3 = τ1 < τ2 < τ3 < . . . < τM satisfying τm+1 2τm for m 1. initialize: epoch m = 0, Z0 := , 0 := c1 ϵ1 + c2ϵ1 log 3, where ϵm := 32 log(\|H\|τm/δ)/τm.
Open Source Code	No	We implemented these algorithms in Vowpal Wabbit (http://hunch.net/ vw/), a fast learning system based on online convex optimization, using logistic regression as the ERM oracle." This refers to using a third-party tool, not the release of the authors' own implementation code.
Open Datasets	No	We performed experiments on 22 binary classiﬁcation datasets with varying sizes (103 to 106) and diverse feature characteristics. Details about the datasets are in Appendix G.1 of [14]." The paper states that details are in an appendix of its longer version, but does not provide specific names, citations, or links for these 22 datasets within the main body of the provided text.
Dataset Splits	No	To simulate the streaming setting, we randomly permuted the datasets, ran the active learning algorithms through the ﬁrst 80% of data, and evaluated the learned classiﬁers on the remaining 20%." This describes a training and test split, but does not explicitly mention a validation split.
Hardware Specification	No	The paper does not provide specific details regarding the hardware used for experiments, such as GPU models, CPU types, or cloud computing specifications.
Software Dependencies	No	We implemented these algorithms in Vowpal Wabbit (http://hunch.net/ vw/), a fast learning system based on online convex optimization, using logistic regression as the ERM oracle." This mentions a software tool but does not provide a specific version number.
Experiment Setup	No	More details about hyper-parameters are in Appendix G.2 of [14]." The paper defers the specific experimental setup details, including hyperparameters, to an appendix in its longer version rather than providing them in the main text.