Out-of-distribution Generalization in the Presence of Nuisance-Induced Spurious Correlations
Authors: Aahlad Manas Puli, Lily H Zhang, Eric Karl Oermann, Rajesh Ranganath
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate NURD on several tasks including chest X-ray classification where, using non-lung patches as the nuisance, NURD produces models that predict pneumonia under strong spurious correlations. |
| Researcher Affiliation | Academia | Aahlad Puli1 Lily H. Zhang Eric K. Oermann Rajesh Ranganath New York University |
| Pseudocode | Yes | Algorithm 1: Reweighting-NURD; Algorithm 2: Generative-NURD |
| Open Source Code | Yes | The code is available here. |
| Open Datasets | Yes | We evaluate NURD on class-conditional Gaussians, labeling colored MNIST images (Arjovsky et al., 2019), distinguishing waterbirds from landbirds, and classifying chest X-rays (Irvin et al., 2019; Johnson et al., 2019). We construct a colored-MNIST dataset (Arjovsky et al., 2019; Gulrajani and Lopez-Paz, 2020) with images of 0s and 1s. We construct a dataset by mixing two chest x-ray datasets, Che Xpert and MIMIC, that have different factors that affect the whole image, with or without pneumonia. |
| Dataset Splits | Yes | We split the training data into training and validation datasets with an 80 20 split. |
| Hardware Specification | No | The paper does not specify the exact hardware (e.g., GPU/CPU models, memory) used for running the experiments. It only refers to models and architectures. |
| Software Dependencies | No | The paper mentions software components like "Adam (Kingma and Ba, 2014) optimizer," "Pixel CNN model," "VQ-VAE 2 (Razavi et al., 2019)," and "Resnet-18 models." However, it does not provide specific version numbers for these software packages or the programming languages/environments used. |
| Experiment Setup | Yes | Models in both steps of NURD are selected using heldout subsets of the training data. We split the training data into training and validation datasets with an 80 20 split. For all experiments, we use λ = 1 and one or two epochs of critic model updates for every predictive model update. We use the Adam (Kingma and Ba, 2014) optimizer with a learning rate of 10-2. We optimized the model for ˆptr(y | z) for 100 epochs and the model for ˆptr(x | y, z) for 300 epochs. We ran the distillation step for 150 epochs with the Adam optimizer with the default learning rate. We use a batch size of 1000 in both stages of NURD. |