reproducibilityindex.ai

Active Learning Helps Pretrained Models Learn the Intended Task

Authors: Alex Tamkin, Dat Nguyen, Salil Deshpande, Jesse Mu, Noah Goodman

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We investigate whether pretrained models are better active learners, capable of choosing examples that improve robustness to such spurious correlations and domain shifts. Intriguingly, we find that better active learning is an emergent property of the pretraining process: pretrained models require up to 5 fewer labels when using uncertainty-based active learning, while non-pretrained models see no or even negative benefit. We consider the use of active learning (AL) on a range of real-world image and text datasets where task ambiguity arises. We compare several AL acquisition functions against a random-sampling baseline, and compare the difference in performance with and without the use of pretrained models.
Researcher Affiliation	Academia	Stanford University
Pseudocode	No	The paper describes the active learning procedure in narrative text and provides a mathematical formula for the acquisition function, but it does not include a formally labeled 'Pseudocode' or 'Algorithm' block, nor structured steps formatted like code.
Open Source Code	Yes	Code and training scripts are available at: https://github.com/alextamkin/active-learning-pretrained-models.
Open Datasets	Yes	We consider a variety of datasets where task ambiguity manifests through a scarcity of particular kinds of examples. We consider two such kinds of examples: those defined by combinations of causal and spurious features (typical vs atypical backgrounds) as well as those defined by unseen attributes that shift during deployment (product categories and camera trap locations). These datasets provide an empirical testbed for the ability of pretrained models to choose disambiguating examples using active learning (AL). Waterbirds [49], Treeperson (created from Visual Genome [33]), i Wild Cam [5, 31], Amazon-WILDS [43, 31].
Dataset Splits	No	The paper discusses using a seed set and acquiring data from an unlabeled pool. It states its method aims at 'Removing the need for a separate validation set' for early stopping, but still refers to 'validation datasets' for plotting results. However, it does not provide explicit percentages or sample counts for the training, validation, and test splits needed to reproduce the initial partitioning of the datasets.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU/CPU models, processor types, or memory specifications used for running the experiments.
Software Dependencies	No	The paper refers to models like BiT and RoBERTa, and mentions using 'standard learning rates and other hyperparameters recommended by model developers' (Appendix B), but it does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	Appendix B: Experimental Details provides specific experimental setup details including seed set sizes (e.g., 'Waterbirds: 100 examples / class'), acquisition step sizes (e.g., 'Waterbirds: 20 examples'), learning rates ('Vision: 1e-4', 'Text: 5e-5'), batch sizes ('Vision: 64', 'Text: 32'), optimizer ('Adam'), weight decay ('1e-4'), and the finetuning termination heuristic ('stop finetuning when the training loss decreases to 0.1% of the original training loss').