reproducibilityindex.ai

In-N-Out: Pre-Training and Self-Training using Auxiliary Information for Out-of-Distribution Robustness

Authors: Sang Michael Xie, Ananya Kumar, Robbie Jones, Fereshte Khani, Tengyu Ma, Percy Liang

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically across three image and time-series datasets, and theoretically in a multi-task linear regression setting, we show that (i) using auxiliary information as input features improves in-distribution error but can hurt OOD error; but (ii) using auxiliary information as outputs of auxiliary pre-training tasks improves OOD error.
Researcher Affiliation	Academia	Stanford University {xie, ananya, rmjones, fereshte, tengyuma, pliang}@cs.stanford.edu
Pseudocode	Yes	Algorithm 1 In-N-Out
Open Source Code	Yes	All code, data, and experiments are on Coda Lab at this link.
Open Datasets	Yes	All code, data, and experiments are on Coda Lab at this link. Our datasets use auxiliary information both derived from the input (Celeb A, Cropland) and from other sources (Landcover). Celeb A. In Celeb A (Liu et al., 2015)... Cropland. We use the Cropland dataset from Wang et al. (2020)... Landcover. The input x is a time series measured by NASA s MODIS satellite (Vermote, 2015), the target y is one of 6 land cover classes, and the auxiliary information z is climate data (e.g., temperature) from ERA5, a dataset computed from various satellites and weather station data (C3S, 2017).
Dataset Splits	Yes	Data splits. We ﬁrst split off the OOD data, then split the rest into training, validation, and in-distribution test (see Appendix B for details). We use a portion of the training set and OOD set as in-distribution and OOD unlabeled data respectively. The rest of the OOD set is held out as test data. We run 5 trials, where we randomly re-generate the training/unlabeled split for each trial (keeping held-out splits ﬁxed). We use a reduced number of labeled examples from each dataset (1%, 5%, 10% of labeled examples for Celeb A, Cropland, and Landcover respectively), with the rest as unlabeled.
Hardware Specification	No	The paper mentions model architectures (e.g., ResNet18, U-Net, 1D-CNN) and software frameworks (e.g., Image Net, PyTorch for fine-tuning based on the acknowledgements of Albert Gu, who works with PyTorch), but does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for training or inference.
Software Dependencies	No	The paper does not provide specific version numbers for software dependencies or libraries used in the experiments.
Experiment Setup	Yes	We use a Res Net18 as the backbone model architecture for all models (see Appendix B.1 for details). Following Wang et al. (2020), we use a U-Net-based model (Ronneberger et al., 2015). We use a 1D-CNN to handle the temporal structure in the MODIS data. We use a reduced number of labeled examples from each dataset (1%, 5%, 10% of labeled examples for Celeb A, Cropland, and Landcover respectively), with the rest as unlabeled. Each method is trained with early-stopping and hyperparameters are chosen using the validation set.