reproducibilityindex.ai

Agnostic Multi-Group Active Learning

Authors: Nicholas Rittler, Kamalika Chaudhuri

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	Our main challenge is that standard active learning techniques such as disagreement-based active learning do not directly apply to the multi-group learning objective. We modify existing algorithms to provide a consistent active learning algorithm for an agnostic formulation of multi-group learning, which given a collection of G distributions and a hypothesis class H with VC-dimension d, outputs an ϵ-optimal hypothesis using O (ν2/ϵ2)Gdθ2 G log2(1/ϵ) + G log(1/ϵ)/ϵ2 label queries, where θG is the worst-case disagreement coefﬁcient over the collection. Roughly speaking, this guarantee improves upon the label complexity of standard multi-group learning in regimes where disagreement-based active learning algorithms may be expected to succeed, and the number of groups is not too large. We also consider the special case where each distribution in the collection is individually realizable with respect to H, and demonstrate O (GdθG log(1/ϵ)) label queries are sufﬁcient for learning in this case. We further give an approximation result for the full agnostic case inspired by the group realizable strategy.
Researcher Affiliation	Academia	Nick Rittler University of California San Diego nrittler@ucsd.edu Kamalika Chaudhuri University of California San Diego kamalika@cs.ucsd.edu
Pseudocode	Yes	Algorithm 1 General Agnostic Algorithm. Algorithm 2 Group Realizable Algorithm.
Open Source Code	No	The paper does not provide any links to source code or make an explicit statement about releasing code for the described methodology.
Open Datasets	No	The paper is theoretical and focuses on algorithm design and label complexity. It does not mention specific named datasets or provide any access information for a publicly available dataset.
Dataset Splits	No	The paper is theoretical and focuses on algorithm design and label complexity. It does not describe any specific training, validation, or test dataset splits.
Hardware Specification	No	The paper is theoretical and focuses on algorithm design and analysis of label complexity. It does not describe any hardware used for experiments.
Software Dependencies	No	The paper is theoretical and focuses on algorithm design and analysis. It does not mention specific software dependencies with version numbers.
Experiment Setup	No	The paper is theoretical and presents algorithms and their guarantees. It does not provide specific experimental setup details such as hyperparameters, optimizer settings, or training schedules.