Cost-Accuracy Aware Adaptive Labeling for Active Learning
Authors: Ruijiang Gao, Maytal Saar-Tsechansky2569-2576
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our proposed algorithm demonstrates state-of-the-art performance on five UCI and a real crowdsourcing dataset. We empirically evaluate the effectiveness and robustness of our method for settings with different cost-accuracy trade-offs reported in prior work to arise in crowdsourcing markets, and for five UCI datasets and a real crowdsourcing dataset. Our results show that our approach offers state-of-the-art performance across settings. |
| Researcher Affiliation | Academia | Ruijiang Gao University of Texas at Austin ruijiang@utexas.edu Maytal Saar-Tsechansky University of Texas at Austin maytal@mail.utexas.udu |
| Pseudocode | Yes | The complete Generalization Bound based Active Learning (GBAL) algorithm based on the criterion in Equation 9, and the Adaptive GBAL (AGB)1 are shown in Algorithms 1, 2. |
| Open Source Code | Yes | 1Code is availble in https://github.com/ruijiang81/AGB |
| Open Datasets | Yes | We use the publicly available UCI datasets (...) We evaluated our approach using the following datasets: German, Mushroom, Pen Digits, Spambase, Audit (Hooda, Bawa, and Rana 2018) from UCI Machine Learning Repository (Bache and Lichman 2013). |
| Dataset Splits | No | We divide each dataset into initial, train and test set, consisting of 5%, 65% and 30% of the data, respectively. This text specifies train and test sets but does not mention a validation set explicitly. |
| Hardware Specification | No | The paper does not provide any specific details regarding the hardware used for running its experiments, such as CPU or GPU models. |
| Software Dependencies | No | Logistic Regression is used as classifier in our experiments. The paper mentions the classifier used but does not provide specific software dependencies with version numbers for reproducibility. |
| Experiment Setup | Yes | We divide each dataset into initial, train and test set, consisting of 5%, 65% and 30% of the data, respectively. in order to create diverse labelers, we first create 30 clusters using KMeans (...) for each dataset. In addition, as in (Huang et al. 2017), we simulate five labelers with cost levels: 5, 4, 3, 2, 1 which are associated with overall labeling accuracies from high to low, respectively. Each labeler is an expert in some random set of clusters by exhibiting a high probability of correctly labeling instances from the corresponding cluster. In particular, the probabilities that a labeler correctly labels instances in her expert clusters are 0.95, 0.925, 0.9, 0.875, and 0.85; these probabilities for non-expert clusters are 0.61, 0.585, 0.56, 0.535, and 0.51 respectively. |