Anytime Active Learning

Authors: Maria Ramirez-Loaiza, Aron Culotta, Mustafa Bilgic

AAAI 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct user studies on two document classification datasets and develop simulated annotators that mimic the users. Our simulated experiments show that anytime active learning outperforms several baselines on these two datasets.
Researcher Affiliation Academia Maria E. Ramirez-Loaiza, Aron Culotta, and Mustafa Bilgic Illinois Institute of Technology Chicago, IL 60616 mramire8@hawk.iit.edu, {aculotta, mbilgic}@iit.edu
Pseudocode Yes Algorithm 1 Static Anytime Active Learning
Open Source Code No The paper does not provide any explicit statements or links indicating the release of open-source code for the described methodology.
Open Datasets Yes Our experiments use two datasets: (1) IMDB: A collection of 50K reviews from IMDB.com labeled with positive or negative sentiment (Maas et al. 2011); (2) SRAA: A collection of 73K Usenet articles labeled as related to aviation or auto documents (Nigam et al. 1998).
Dataset Splits No The paper states 'We reserve half of the data for testing, and use the remaining to simulate active learning' and mentions 'held-out data' for oracle simulation, but it does not specify a distinct validation set or explicit train/validation/test splits for the main model training.
Hardware Specification No The paper does not provide specific hardware details such as GPU or CPU models, or memory amounts used for running its experiments.
Software Dependencies No The paper describes the use of a 'logistic regression classifier with L1 regularization' but does not specify software names with version numbers for reproducibility.
Experiment Setup Yes For the student, we use a logistic regression classifier with L1 regularization using the default parameter C = 1, seeded with a labeled set of two examples. At each round of active learning, a subsample of 250 examples are selected uniformly from the unlabeled set U. For each of the datasets, we set C and T so that the distribution of neutral labels by subinstance size most closely matches the results of the user study. We searched values C [0.001, 3] with 0.001 step and T [0.3, 0.45] with 0.05 step, selecting C = 0.3, T = 0.4 for IMDB and C = 0.01, T = 0.3 for SRAA.