Re-Active Learning: Active Learning with Relabeling

Authors: Christopher Lin, M Mausam, Daniel Weld

AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct empirical experiments on both synthetic and real-world datasets, showing that our new algorithms significantly outperform traditional active learning techniques and other natural baselines on the problem of re-active learning.
Researcher Affiliation Academia Christopher H. Lin University of Washington Seattle, WA tchrislin@cs.washington.edu Mausam Indian Institute of Technology Delhi, India mausam@cse.iitd.ac.in Daniel S. Weld University of Washington Seattle, WA weld@cs.washington.edu
Pseudocode Yes Algorithm 1 describes the framework for computing the impact of an example xi.
Open Source Code No The paper does not provide any explicit statements about releasing source code or links to a code repository.
Open Datasets Yes Figure 3 compares impact sampling against uncertainty sampling on datasets from the UCI Machine Learning Repository (Bache and Lichman 2013) with synthetically-generated labels. We first consider a popular dataset (Kamar, Hacker, and Horvitz 2012; Lintott et al. 2011; 2008) from Galaxy Zoo
Dataset Splits Yes We seed training with 50 examples, use a total budget of 1,000, and test on 300 held-out examples. We randomly generate a training set of 70% of the examples, a held-out set of 15% of the examples, and a test set of the remaining 15% of the examples.
Hardware Specification Yes Experiments are programmed in Python using an Intel Xeon E7-4850-v2 processor (2.3 GHz, 24M Cache) with 512 GB of RAM.
Software Dependencies No The paper mentions 'programmed in Python' but does not specify version numbers for Python or any other software libraries.
Experiment Setup Yes We seed training with 50 examples, use a total budget of 1,000, and test on 300 held-out examples. We vary the number of features among z {10, 30, 50, 70, 90}. We assume the classification noise model (Angluin and Laird 1988): each label is independently flipped from the true label, h (x), with probability 0.25. We assume that label accuracy is known and use majority vote for f, the label aggregation function. We randomly generate a training set of 70% of the examples, a held-out set of 15% of the examples, and a test set of the remaining 15% of the examples.