reproducibilityindex.ai

Re-Active Learning: Active Learning with Relabeling

Authors: Christopher Lin, M Mausam, Daniel Weld

AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct empirical experiments on both synthetic and real-world datasets, showing that our new algorithms signiﬁcantly outperform traditional active learning techniques and other natural baselines on the problem of re-active learning.
Researcher Affiliation	Academia	Christopher H. Lin University of Washington Seattle, WA tchrislin@cs.washington.edu Mausam Indian Institute of Technology Delhi, India mausam@cse.iitd.ac.in Daniel S. Weld University of Washington Seattle, WA weld@cs.washington.edu
Pseudocode	Yes	Algorithm 1 describes the framework for computing the impact of an example xi.
Open Source Code	No	The paper does not provide any explicit statements about releasing source code or links to a code repository.
Open Datasets	Yes	Figure 3 compares impact sampling against uncertainty sampling on datasets from the UCI Machine Learning Repository (Bache and Lichman 2013) with synthetically-generated labels. We ﬁrst consider a popular dataset (Kamar, Hacker, and Horvitz 2012; Lintott et al. 2011; 2008) from Galaxy Zoo
Dataset Splits	Yes	We seed training with 50 examples, use a total budget of 1,000, and test on 300 held-out examples. We randomly generate a training set of 70% of the examples, a held-out set of 15% of the examples, and a test set of the remaining 15% of the examples.
Hardware Specification	Yes	Experiments are programmed in Python using an Intel Xeon E7-4850-v2 processor (2.3 GHz, 24M Cache) with 512 GB of RAM.
Software Dependencies	No	The paper mentions 'programmed in Python' but does not specify version numbers for Python or any other software libraries.
Experiment Setup	Yes	We seed training with 50 examples, use a total budget of 1,000, and test on 300 held-out examples. We vary the number of features among z {10, 30, 50, 70, 90}. We assume the classiﬁcation noise model (Angluin and Laird 1988): each label is independently ﬂipped from the true label, h (x), with probability 0.25. We assume that label accuracy is known and use majority vote for f, the label aggregation function. We randomly generate a training set of 70% of the examples, a held-out set of 15% of the examples, and a test set of the remaining 15% of the examples.