reproducibilityindex.ai

Anytime Active Learning

Authors: Maria Ramirez-Loaiza, Aron Culotta, Mustafa Bilgic

AAAI 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct user studies on two document classiﬁcation datasets and develop simulated annotators that mimic the users. Our simulated experiments show that anytime active learning outperforms several baselines on these two datasets.
Researcher Affiliation	Academia	Maria E. Ramirez-Loaiza, Aron Culotta, and Mustafa Bilgic Illinois Institute of Technology Chicago, IL 60616 mramire8@hawk.iit.edu, {aculotta, mbilgic}@iit.edu
Pseudocode	Yes	Algorithm 1 Static Anytime Active Learning
Open Source Code	No	The paper does not provide any explicit statements or links indicating the release of open-source code for the described methodology.
Open Datasets	Yes	Our experiments use two datasets: (1) IMDB: A collection of 50K reviews from IMDB.com labeled with positive or negative sentiment (Maas et al. 2011); (2) SRAA: A collection of 73K Usenet articles labeled as related to aviation or auto documents (Nigam et al. 1998).
Dataset Splits	No	The paper states 'We reserve half of the data for testing, and use the remaining to simulate active learning' and mentions 'held-out data' for oracle simulation, but it does not specify a distinct validation set or explicit train/validation/test splits for the main model training.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU or CPU models, or memory amounts used for running its experiments.
Software Dependencies	No	The paper describes the use of a 'logistic regression classiﬁer with L1 regularization' but does not specify software names with version numbers for reproducibility.
Experiment Setup	Yes	For the student, we use a logistic regression classiﬁer with L1 regularization using the default parameter C = 1, seeded with a labeled set of two examples. At each round of active learning, a subsample of 250 examples are selected uniformly from the unlabeled set U. For each of the datasets, we set C and T so that the distribution of neutral labels by subinstance size most closely matches the results of the user study. We searched values C [0.001, 3] with 0.001 step and T [0.3, 0.45] with 0.05 step, selecting C = 0.3, T = 0.4 for IMDB and C = 0.01, T = 0.3 for SRAA.