Adaptive Region-Based Active Learning

Authors: Corinna Cortes, Giulia Desalvo, Claudio Gentile, Mehryar Mohri, Ningshan Zhang

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We also report the results of an extensive suite of experiments on several real-world datasets demonstrating substantial empirical benefits over existing single-region and non-adaptive region-based active learning baselines. In Section 5, we report the results of a series of experiments on multiple datasets, demonstrating the substantial benefits of ARBAL over existing non-region-based active learning algorithms, such as IWAL and margin-based uncertainty sampling, and over the non-adaptive region-based active learning baseline ORIWAL (Cortes et al., 2019b).
Researcher Affiliation Collaboration 1Google Research, New York, NY; 2Courant Institute of Mathematical Sciences, New York, NY; 3Hudson River Trading, New York, NY.
Pseudocode Yes The pseudocode of ARBAL is given in Algorithm 1. The pseudocode of SPLIT is given in Algorithm 2.
Open Source Code No No explicit statement or link regarding the release of source code for the described methodology was found.
Open Datasets Yes We tested 24 binary classification datasets from the UCI and openml repositories, and also the MNIST dataset with 3 and 5 as the two classes, which is standard binary classification task extracted from the MNIST dataset (e.g., (Crammer et al., 2009)).
Dataset Splits Yes For each experiment, we randomly shuffled the dataset, ran the algorithms on the first half of the data (so that the number of active learning rounds T equals N/2), and tested the classifier returned on the remaining half to measure misclassification loss.
Hardware Specification No No specific hardware details (like GPU/CPU models, processor types, or memory amounts) used for running experiments were mentioned.
Software Dependencies No No specific software dependencies with version numbers were explicitly mentioned (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup Yes We chose κ = 20 and allow the first phase to run at most τ = 800 rounds so as to make ARBAL fully split into the desired number of regions on almost all datasets. Since the slack term σT derived from high-probability analyses are typically overly conservative, we simply use 0.01/ Tk in the SPLIT subroutine. ... We set ρ = 0.01 in our experiments. ... We use the logistic loss function ℓdefined for all (x, y) X Y and hypotheses h: X R by ℓ(h(x), y) = log(1 + e yh(x)), which we then rescale to [0, 1]. The initial hypothesis set H consists of 3,000 randomly drawn hyperplanes with bounded norms.