reproducibilityindex.ai

Contrastive Adapters for Foundation Model Group Robustness

Authors: Michael Zhang, Christopher Ré

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In our experiments, we validate that contrastive adapting effectively and efficiently improves FM group robustness. First, across all 9 robustness benchmarks, we find contrastive adapting consistently improves worst-group accuracy over zero-shot (by 8.5 to 56.0 pp), using no training group labels and only training MLPs with 0.1% to 0.3% of the original FM parameters. Then, on a representative set of benchmarks with various group shifts and training data group sizes, we find contrastive adapting can substantially outperform prior adapter training strategies, and outperforms other approaches that only use fixed FM embeddings (achieving up to 12.4 pp higher worst-group accuracy than the next best method on average).
Researcher Affiliation	Academia	Michael Zhang and Christopher Ré Department of Computer Science Stanford University {mzhang, chrismre}@cs.stanford.edu
Pseudocode	Yes	Restating the standard sample cross-entropy loss with adapters makes this clear as an Info NCE loss [14, 56]: (f (u), y) = log exp( ˆf (u)>ˆv/ ) PC c=1 exp( ˆf (u)>ˆvc/ )
Open Source Code	Yes	Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] See attached zip in supplementary.
Open Datasets	Yes	We benchmark methods on the following sources of group shift (Figure 2): Spurious confounders. For example, in Waterbirds [65, 75], a water background is a confounder for the waterbirds class. Subclass variance. For example, in BREEDS Living-17 [67], the ape class includes images of gibbons and gorillas. Data source variance. For example, we set up the CIFAR-10.02 dataset by combining CIFAR-10 [41] and CIFAR-10.2 [49].
Dataset Splits	Yes	Different from domain generalization or OOD evaluation settings [31, 32, 44, 64, 82], we observe each data group in training, validation, and test splits.
Hardware Specification	No	The paper states: “Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] For space we defer to the appendix.” As the main paper does not contain these details and the appendix is not provided, the specific hardware specifications are not directly available in the analyzed text.
Software Dependencies	No	The paper mentions that experimental details, including hyperparameters, are in Appendix C, but it does not specify any software names with version numbers in the main text that would be necessary for replication.
Experiment Setup	Yes	We include experimental details for all models and hyperparameters in Appendix C.