Adaptive Region-Based Active Learning
Authors: Corinna Cortes, Giulia Desalvo, Claudio Gentile, Mehryar Mohri, Ningshan Zhang
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We also report the results of an extensive suite of experiments on several real-world datasets demonstrating substantial empirical benefits over existing single-region and non-adaptive region-based active learning baselines. In Section 5, we report the results of a series of experiments on multiple datasets, demonstrating the substantial benefits of ARBAL over existing non-region-based active learning algorithms, such as IWAL and margin-based uncertainty sampling, and over the non-adaptive region-based active learning baseline ORIWAL (Cortes et al., 2019b). |
| Researcher Affiliation | Collaboration | 1Google Research, New York, NY; 2Courant Institute of Mathematical Sciences, New York, NY; 3Hudson River Trading, New York, NY. |
| Pseudocode | Yes | The pseudocode of ARBAL is given in Algorithm 1. The pseudocode of SPLIT is given in Algorithm 2. |
| Open Source Code | No | No explicit statement or link regarding the release of source code for the described methodology was found. |
| Open Datasets | Yes | We tested 24 binary classification datasets from the UCI and openml repositories, and also the MNIST dataset with 3 and 5 as the two classes, which is standard binary classification task extracted from the MNIST dataset (e.g., (Crammer et al., 2009)). |
| Dataset Splits | Yes | For each experiment, we randomly shuffled the dataset, ran the algorithms on the first half of the data (so that the number of active learning rounds T equals N/2), and tested the classifier returned on the remaining half to measure misclassification loss. |
| Hardware Specification | No | No specific hardware details (like GPU/CPU models, processor types, or memory amounts) used for running experiments were mentioned. |
| Software Dependencies | No | No specific software dependencies with version numbers were explicitly mentioned (e.g., Python 3.x, PyTorch 1.x). |
| Experiment Setup | Yes | We chose κ = 20 and allow the first phase to run at most τ = 800 rounds so as to make ARBAL fully split into the desired number of regions on almost all datasets. Since the slack term σT derived from high-probability analyses are typically overly conservative, we simply use 0.01/ Tk in the SPLIT subroutine. ... We set ρ = 0.01 in our experiments. ... We use the logistic loss function ℓdefined for all (x, y) X Y and hypotheses h: X R by ℓ(h(x), y) = log(1 + e yh(x)), which we then rescale to [0, 1]. The initial hypothesis set H consists of 3,000 randomly drawn hyperplanes with bounded norms. |