Efficient and Parsimonious Agnostic Active Learning
Authors: Tzu-Kuo Huang, Alekh Agarwal, Daniel J. Hsu, John Langford, Robert E. Schapire
NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct an extensive study in Section 5 by simulating the interaction of the active learning algorithm with a streaming supervised dataset. Results on a wide array of datasets show that agnostic active learning typically outperforms passive learning, and the magnitude of improvement depends on how carefully the active learning hyper-parameters are chosen. |
| Researcher Affiliation | Collaboration | Tzu-Kuo Huang Microsoft Research, NYC tkhuang@microsoft.com Alekh Agarwal Microsoft Research, NYC alekha@microsoft.com Daniel Hsu Columbia University djhsu@cs.columbia.edu John Langford Microsoft Research, NYC jcl@microsoft.com Robert E. Schapire Microsoft Research, NYC schapire@microsoft.com |
| Pseudocode | Yes | Algorithm 1 ACTIVE COVER (AC) input: Constants c1, c2, c3, confidence δ, error radius γ, parameters α, β, ξ for (OP), epoch schedule 0 = τ0 < 3 = τ1 < τ2 < τ3 < . . . < τM satisfying τm+1 2τm for m 1. initialize: epoch m = 0, Z0 := , 0 := c1 ϵ1 + c2ϵ1 log 3, where ϵm := 32 log(|H|τm/δ)/τm. |
| Open Source Code | No | We implemented these algorithms in Vowpal Wabbit (http://hunch.net/ vw/), a fast learning system based on online convex optimization, using logistic regression as the ERM oracle." This refers to using a third-party tool, not the release of the authors' own implementation code. |
| Open Datasets | No | We performed experiments on 22 binary classification datasets with varying sizes (103 to 106) and diverse feature characteristics. Details about the datasets are in Appendix G.1 of [14]." The paper states that details are in an appendix of its longer version, but does not provide specific names, citations, or links for these 22 datasets within the main body of the provided text. |
| Dataset Splits | No | To simulate the streaming setting, we randomly permuted the datasets, ran the active learning algorithms through the first 80% of data, and evaluated the learned classifiers on the remaining 20%." This describes a training and test split, but does not explicitly mention a validation split. |
| Hardware Specification | No | The paper does not provide specific details regarding the hardware used for experiments, such as GPU models, CPU types, or cloud computing specifications. |
| Software Dependencies | No | We implemented these algorithms in Vowpal Wabbit (http://hunch.net/ vw/), a fast learning system based on online convex optimization, using logistic regression as the ERM oracle." This mentions a software tool but does not provide a specific version number. |
| Experiment Setup | No | More details about hyper-parameters are in Appendix G.2 of [14]." The paper defers the specific experimental setup details, including hyperparameters, to an appendix in its longer version rather than providing them in the main text. |