Re-Active Learning: Active Learning with Relabeling
Authors: Christopher Lin, M Mausam, Daniel Weld
AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct empirical experiments on both synthetic and real-world datasets, showing that our new algorithms significantly outperform traditional active learning techniques and other natural baselines on the problem of re-active learning. |
| Researcher Affiliation | Academia | Christopher H. Lin University of Washington Seattle, WA tchrislin@cs.washington.edu Mausam Indian Institute of Technology Delhi, India mausam@cse.iitd.ac.in Daniel S. Weld University of Washington Seattle, WA weld@cs.washington.edu |
| Pseudocode | Yes | Algorithm 1 describes the framework for computing the impact of an example xi. |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code or links to a code repository. |
| Open Datasets | Yes | Figure 3 compares impact sampling against uncertainty sampling on datasets from the UCI Machine Learning Repository (Bache and Lichman 2013) with synthetically-generated labels. We first consider a popular dataset (Kamar, Hacker, and Horvitz 2012; Lintott et al. 2011; 2008) from Galaxy Zoo |
| Dataset Splits | Yes | We seed training with 50 examples, use a total budget of 1,000, and test on 300 held-out examples. We randomly generate a training set of 70% of the examples, a held-out set of 15% of the examples, and a test set of the remaining 15% of the examples. |
| Hardware Specification | Yes | Experiments are programmed in Python using an Intel Xeon E7-4850-v2 processor (2.3 GHz, 24M Cache) with 512 GB of RAM. |
| Software Dependencies | No | The paper mentions 'programmed in Python' but does not specify version numbers for Python or any other software libraries. |
| Experiment Setup | Yes | We seed training with 50 examples, use a total budget of 1,000, and test on 300 held-out examples. We vary the number of features among z {10, 30, 50, 70, 90}. We assume the classification noise model (Angluin and Laird 1988): each label is independently flipped from the true label, h (x), with probability 0.25. We assume that label accuracy is known and use majority vote for f, the label aggregation function. We randomly generate a training set of 70% of the examples, a held-out set of 15% of the examples, and a test set of the remaining 15% of the examples. |