reproducibilityindex.ai

On Feature Learning in the Presence of Spurious Correlations

Authors: Pavel Izmailov, Polina Kirichenko, Nate Gruver, Andrew G. Wilson

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	On multiple vision and NLP problems, we show that the features learned by simple ERM are highly competitive with the features learned by specialized group robustness methods targeted at reducing the effect of spurious correlations. Moreover, we show that the quality of learned feature representations is greatly affected by the design decisions beyond the training method, such as the model architecture and pre-training strategy. In this paper, we provide an in-depth study of the factors that affect the quality of learned representations in the presence of spurious correlations: how accurately can we decode the core features from the learned representations.
Researcher Affiliation	Academia	Pavel Izmailov Polina Kirichenko Nate Gruver Andrew Gordon Wilson New York University
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Our code is available at github.com/izmailovpavel/spurious_feature_learning.
Open Datasets	Yes	In order to cover a broad range of practical scenarios, we consider four image classification and two text classification problems. Waterbirds [76] is a binary image classification problem... Celeb A hair color [51] is a binary image classification problem... WILDS-FMOW [12, 43, 77] is a satellite image classification problem... CXR-14 [89] is a dataset with chest X-ray images... Multi NLI [91, 76] is a text classification problem... Civil Comments [8, 43] is a text classification problem...
Dataset Splits	Yes	Thoughout this paper, we use a group-balanced held-out dataset (subset of the validation dataset where each group has the same number of datapoints) as the reweighting dataset ˆD; Kirichenko et al. [40] denote this variation of the method as DFRVal Tr. For WILDS-FMOW we use the standard WILDS-FMOW splits [43].
Hardware Specification	Yes	We ran experiments on NVIDIA A100 GPUs. Experiments were run on clusters with Nvidia A100 (40GB) GPUs.
Software Dependencies	No	The paper mentions software like PyTorch [67], Weights & Biases [7], NumPy [24], SciPy [87], pandas [57], matplotlib [33], torchvision [55], and Hugging Face Transformers [92], but does not provide specific version numbers for any of these.
Experiment Setup	Yes	We train ERM models, RWG and RWY with the same hyper-parameters shared between all the image datasets (apart from batch size which is set to 32 on Waterbirds, and 100 on the other datasets)... For group DRO, we run a grid search over the values of the generalization adjustment C, weight decay and learning rate hyper-parameters... For ERM, RWG, and RWY we train models with learning rate 1e-5 for BERT models and 1e-4 for Res Net-50 and Dense Net-121 models. We use batch size 32 for Waterbirds, and 100 for Celeb A, FMOW and CXR. For NLP models, we use batch size 16 for Multi NLI and 32 for Civil Comments.