Agnostic Multi-Group Active Learning

Authors: Nicholas Rittler, Kamalika Chaudhuri

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical Our main challenge is that standard active learning techniques such as disagreement-based active learning do not directly apply to the multi-group learning objective. We modify existing algorithms to provide a consistent active learning algorithm for an agnostic formulation of multi-group learning, which given a collection of G distributions and a hypothesis class H with VC-dimension d, outputs an ϵ-optimal hypothesis using O (ν2/ϵ2)Gdθ2 G log2(1/ϵ) + G log(1/ϵ)/ϵ2 label queries, where θG is the worst-case disagreement coefficient over the collection. Roughly speaking, this guarantee improves upon the label complexity of standard multi-group learning in regimes where disagreement-based active learning algorithms may be expected to succeed, and the number of groups is not too large. We also consider the special case where each distribution in the collection is individually realizable with respect to H, and demonstrate O (GdθG log(1/ϵ)) label queries are sufficient for learning in this case. We further give an approximation result for the full agnostic case inspired by the group realizable strategy.
Researcher Affiliation Academia Nick Rittler University of California San Diego nrittler@ucsd.edu Kamalika Chaudhuri University of California San Diego kamalika@cs.ucsd.edu
Pseudocode Yes Algorithm 1 General Agnostic Algorithm. Algorithm 2 Group Realizable Algorithm.
Open Source Code No The paper does not provide any links to source code or make an explicit statement about releasing code for the described methodology.
Open Datasets No The paper is theoretical and focuses on algorithm design and label complexity. It does not mention specific named datasets or provide any access information for a publicly available dataset.
Dataset Splits No The paper is theoretical and focuses on algorithm design and label complexity. It does not describe any specific training, validation, or test dataset splits.
Hardware Specification No The paper is theoretical and focuses on algorithm design and analysis of label complexity. It does not describe any hardware used for experiments.
Software Dependencies No The paper is theoretical and focuses on algorithm design and analysis. It does not mention specific software dependencies with version numbers.
Experiment Setup No The paper is theoretical and presents algorithms and their guarantees. It does not provide specific experimental setup details such as hyperparameters, optimizer settings, or training schedules.