reproducibilityindex.ai

Out-of-distribution Generalization in the Presence of Nuisance-Induced Spurious Correlations

Authors: Aahlad Manas Puli, Lily H Zhang, Eric Karl Oermann, Rajesh Ranganath

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate NURD on several tasks including chest X-ray classiﬁcation where, using non-lung patches as the nuisance, NURD produces models that predict pneumonia under strong spurious correlations.
Researcher Affiliation	Academia	Aahlad Puli1 Lily H. Zhang Eric K. Oermann Rajesh Ranganath New York University
Pseudocode	Yes	Algorithm 1: Reweighting-NURD; Algorithm 2: Generative-NURD
Open Source Code	Yes	The code is available here.
Open Datasets	Yes	We evaluate NURD on class-conditional Gaussians, labeling colored MNIST images (Arjovsky et al., 2019), distinguishing waterbirds from landbirds, and classifying chest X-rays (Irvin et al., 2019; Johnson et al., 2019). We construct a colored-MNIST dataset (Arjovsky et al., 2019; Gulrajani and Lopez-Paz, 2020) with images of 0s and 1s. We construct a dataset by mixing two chest x-ray datasets, Che Xpert and MIMIC, that have different factors that affect the whole image, with or without pneumonia.
Dataset Splits	Yes	We split the training data into training and validation datasets with an 80 20 split.
Hardware Specification	No	The paper does not specify the exact hardware (e.g., GPU/CPU models, memory) used for running the experiments. It only refers to models and architectures.
Software Dependencies	No	The paper mentions software components like "Adam (Kingma and Ba, 2014) optimizer," "Pixel CNN model," "VQ-VAE 2 (Razavi et al., 2019)," and "Resnet-18 models." However, it does not provide specific version numbers for these software packages or the programming languages/environments used.
Experiment Setup	Yes	Models in both steps of NURD are selected using heldout subsets of the training data. We split the training data into training and validation datasets with an 80 20 split. For all experiments, we use λ = 1 and one or two epochs of critic model updates for every predictive model update. We use the Adam (Kingma and Ba, 2014) optimizer with a learning rate of 10-2. We optimized the model for ˆptr(y \| z) for 100 epochs and the model for ˆptr(x \| y, z) for 300 epochs. We ran the distillation step for 150 epochs with the Adam optimizer with the default learning rate. We use a batch size of 1000 in both stages of NURD.