Agnostic Multi-Group Active Learning
Authors: Nicholas Rittler, Kamalika Chaudhuri
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | Our main challenge is that standard active learning techniques such as disagreement-based active learning do not directly apply to the multi-group learning objective. We modify existing algorithms to provide a consistent active learning algorithm for an agnostic formulation of multi-group learning, which given a collection of G distributions and a hypothesis class H with VC-dimension d, outputs an ϵ-optimal hypothesis using O (ν2/ϵ2)Gdθ2 G log2(1/ϵ) + G log(1/ϵ)/ϵ2 label queries, where θG is the worst-case disagreement coefficient over the collection. Roughly speaking, this guarantee improves upon the label complexity of standard multi-group learning in regimes where disagreement-based active learning algorithms may be expected to succeed, and the number of groups is not too large. We also consider the special case where each distribution in the collection is individually realizable with respect to H, and demonstrate O (GdθG log(1/ϵ)) label queries are sufficient for learning in this case. We further give an approximation result for the full agnostic case inspired by the group realizable strategy. |
| Researcher Affiliation | Academia | Nick Rittler University of California San Diego nrittler@ucsd.edu Kamalika Chaudhuri University of California San Diego kamalika@cs.ucsd.edu |
| Pseudocode | Yes | Algorithm 1 General Agnostic Algorithm. Algorithm 2 Group Realizable Algorithm. |
| Open Source Code | No | The paper does not provide any links to source code or make an explicit statement about releasing code for the described methodology. |
| Open Datasets | No | The paper is theoretical and focuses on algorithm design and label complexity. It does not mention specific named datasets or provide any access information for a publicly available dataset. |
| Dataset Splits | No | The paper is theoretical and focuses on algorithm design and label complexity. It does not describe any specific training, validation, or test dataset splits. |
| Hardware Specification | No | The paper is theoretical and focuses on algorithm design and analysis of label complexity. It does not describe any hardware used for experiments. |
| Software Dependencies | No | The paper is theoretical and focuses on algorithm design and analysis. It does not mention specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and presents algorithms and their guarantees. It does not provide specific experimental setup details such as hyperparameters, optimizer settings, or training schedules. |