Contrastive Adapters for Foundation Model Group Robustness
Authors: Michael Zhang, Christopher Ré
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our experiments, we validate that contrastive adapting effectively and efficiently improves FM group robustness. First, across all 9 robustness benchmarks, we find contrastive adapting consistently improves worst-group accuracy over zero-shot (by 8.5 to 56.0 pp), using no training group labels and only training MLPs with 0.1% to 0.3% of the original FM parameters. Then, on a representative set of benchmarks with various group shifts and training data group sizes, we find contrastive adapting can substantially outperform prior adapter training strategies, and outperforms other approaches that only use fixed FM embeddings (achieving up to 12.4 pp higher worst-group accuracy than the next best method on average). |
| Researcher Affiliation | Academia | Michael Zhang and Christopher Ré Department of Computer Science Stanford University {mzhang, chrismre}@cs.stanford.edu |
| Pseudocode | Yes | Restating the standard sample cross-entropy loss with adapters makes this clear as an Info NCE loss [14, 56]: (f (u), y) = log exp( ˆf (u)>ˆv/ ) PC c=1 exp( ˆf (u)>ˆvc/ ) |
| Open Source Code | Yes | Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] See attached zip in supplementary. |
| Open Datasets | Yes | We benchmark methods on the following sources of group shift (Figure 2): Spurious confounders. For example, in Waterbirds [65, 75], a water background is a confounder for the waterbirds class. Subclass variance. For example, in BREEDS Living-17 [67], the ape class includes images of gibbons and gorillas. Data source variance. For example, we set up the CIFAR-10.02 dataset by combining CIFAR-10 [41] and CIFAR-10.2 [49]. |
| Dataset Splits | Yes | Different from domain generalization or OOD evaluation settings [31, 32, 44, 64, 82], we observe each data group in training, validation, and test splits. |
| Hardware Specification | No | The paper states: “Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] For space we defer to the appendix.” As the main paper does not contain these details and the appendix is not provided, the specific hardware specifications are not directly available in the analyzed text. |
| Software Dependencies | No | The paper mentions that experimental details, including hyperparameters, are in Appendix C, but it does not specify any software names with version numbers in the main text that would be necessary for replication. |
| Experiment Setup | Yes | We include experimental details for all models and hyperparameters in Appendix C. |