Efficient and Parsimonious Agnostic Active Learning

Authors: Tzu-Kuo Huang, Alekh Agarwal, Daniel J. Hsu, John Langford, Robert E. Schapire

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct an extensive study in Section 5 by simulating the interaction of the active learning algorithm with a streaming supervised dataset. Results on a wide array of datasets show that agnostic active learning typically outperforms passive learning, and the magnitude of improvement depends on how carefully the active learning hyper-parameters are chosen.
Researcher Affiliation Collaboration Tzu-Kuo Huang Microsoft Research, NYC tkhuang@microsoft.com Alekh Agarwal Microsoft Research, NYC alekha@microsoft.com Daniel Hsu Columbia University djhsu@cs.columbia.edu John Langford Microsoft Research, NYC jcl@microsoft.com Robert E. Schapire Microsoft Research, NYC schapire@microsoft.com
Pseudocode Yes Algorithm 1 ACTIVE COVER (AC) input: Constants c1, c2, c3, confidence δ, error radius γ, parameters α, β, ξ for (OP), epoch schedule 0 = τ0 < 3 = τ1 < τ2 < τ3 < . . . < τM satisfying τm+1 2τm for m 1. initialize: epoch m = 0, Z0 := , 0 := c1 ϵ1 + c2ϵ1 log 3, where ϵm := 32 log(|H|τm/δ)/τm.
Open Source Code No We implemented these algorithms in Vowpal Wabbit (http://hunch.net/ vw/), a fast learning system based on online convex optimization, using logistic regression as the ERM oracle." This refers to using a third-party tool, not the release of the authors' own implementation code.
Open Datasets No We performed experiments on 22 binary classification datasets with varying sizes (103 to 106) and diverse feature characteristics. Details about the datasets are in Appendix G.1 of [14]." The paper states that details are in an appendix of its longer version, but does not provide specific names, citations, or links for these 22 datasets within the main body of the provided text.
Dataset Splits No To simulate the streaming setting, we randomly permuted the datasets, ran the active learning algorithms through the first 80% of data, and evaluated the learned classifiers on the remaining 20%." This describes a training and test split, but does not explicitly mention a validation split.
Hardware Specification No The paper does not provide specific details regarding the hardware used for experiments, such as GPU models, CPU types, or cloud computing specifications.
Software Dependencies No We implemented these algorithms in Vowpal Wabbit (http://hunch.net/ vw/), a fast learning system based on online convex optimization, using logistic regression as the ERM oracle." This mentions a software tool but does not provide a specific version number.
Experiment Setup No More details about hyper-parameters are in Appendix G.2 of [14]." The paper defers the specific experimental setup details, including hyperparameters, to an appendix in its longer version rather than providing them in the main text.