In-N-Out: Pre-Training and Self-Training using Auxiliary Information for Out-of-Distribution Robustness
Authors: Sang Michael Xie, Ananya Kumar, Robbie Jones, Fereshte Khani, Tengyu Ma, Percy Liang
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically across three image and time-series datasets, and theoretically in a multi-task linear regression setting, we show that (i) using auxiliary information as input features improves in-distribution error but can hurt OOD error; but (ii) using auxiliary information as outputs of auxiliary pre-training tasks improves OOD error. |
| Researcher Affiliation | Academia | Stanford University {xie, ananya, rmjones, fereshte, tengyuma, pliang}@cs.stanford.edu |
| Pseudocode | Yes | Algorithm 1 In-N-Out |
| Open Source Code | Yes | All code, data, and experiments are on Coda Lab at this link. |
| Open Datasets | Yes | All code, data, and experiments are on Coda Lab at this link. Our datasets use auxiliary information both derived from the input (Celeb A, Cropland) and from other sources (Landcover). Celeb A. In Celeb A (Liu et al., 2015)... Cropland. We use the Cropland dataset from Wang et al. (2020)... Landcover. The input x is a time series measured by NASA s MODIS satellite (Vermote, 2015), the target y is one of 6 land cover classes, and the auxiliary information z is climate data (e.g., temperature) from ERA5, a dataset computed from various satellites and weather station data (C3S, 2017). |
| Dataset Splits | Yes | Data splits. We first split off the OOD data, then split the rest into training, validation, and in-distribution test (see Appendix B for details). We use a portion of the training set and OOD set as in-distribution and OOD unlabeled data respectively. The rest of the OOD set is held out as test data. We run 5 trials, where we randomly re-generate the training/unlabeled split for each trial (keeping held-out splits fixed). We use a reduced number of labeled examples from each dataset (1%, 5%, 10% of labeled examples for Celeb A, Cropland, and Landcover respectively), with the rest as unlabeled. |
| Hardware Specification | No | The paper mentions model architectures (e.g., ResNet18, U-Net, 1D-CNN) and software frameworks (e.g., Image Net, PyTorch for fine-tuning based on the acknowledgements of Albert Gu, who works with PyTorch), but does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for training or inference. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies or libraries used in the experiments. |
| Experiment Setup | Yes | We use a Res Net18 as the backbone model architecture for all models (see Appendix B.1 for details). Following Wang et al. (2020), we use a U-Net-based model (Ronneberger et al., 2015). We use a 1D-CNN to handle the temporal structure in the MODIS data. We use a reduced number of labeled examples from each dataset (1%, 5%, 10% of labeled examples for Celeb A, Cropland, and Landcover respectively), with the rest as unlabeled. Each method is trained with early-stopping and hyperparameters are chosen using the validation set. |