Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Improving Out-of-Distribution Robustness via Selective Augmentation

Authors: Huaxiu Yao, Yu Wang, Sai Li, Linjun Zhang, Weixin Liang, James Zou, Chelsea Finn

ICML 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we conduct comprehensive experiments to evaluate the effectiveness of LISA. Specifically, we aim to answer the following questions: Q1: Compared to prior methods, can LISA improve robustness to subpopulation shifts and domain shifts (Section 4.1 and Section 4.2)? Q2: Which aspects of LISA are the most important for improving robustness (Section 4.3)? Q3: Does LISA successfully produce more invariant predictors (Section 4.4)? Q4: How does LISA perform with varying degrees of distribution shifts (Section 4.5)?
Researcher Affiliation Academia 1Stanford University, CA, USA 2University of California San Diego, CA, USA 3Renmin University of China, Beijing, China 4Rutgers University, NJ, USA.
Pseudocode Yes Algorithm 1 Training Procedure of LISA
Open Source Code Yes Code is released in https://github.com/huaxiuyao/LISA
Open Datasets Yes We classify MNIST digits from 2 classes... The data sizes of train, validation, and test sets are 30000, 10000, and 20000, respectively. Follow (Arjovsky et al., 2019), we flip labels with probability 0.25.
Dataset Splits Yes The data sizes of train, validation, and test sets are 30000, 10000, and 20000, respectively.
Hardware Specification No The paper mentions using 'pre-trained Res Net-50' and 'Distil BERT-uncased' as models but does not specify any hardware details like GPU models, CPU, or memory used for training or inference.
Software Dependencies No The paper mentions using 'pre-trained Res Net-50' and 'Distil BERT' architectures and frameworks like 'SGD' and 'Adam' for optimization, but it does not provide specific version numbers for any software libraries or dependencies (e.g., Python, PyTorch, TensorFlow, CUDA versions).
Experiment Setup Yes All hyperparameters are selected via cross-validation and are listed in Table 9. Table 9: Hyperparameter settings for the subpopulation shifts. Learning rate, Weight decay, Scheduler, Batch size, Type of mixup, Architecture, Optimizer, Maximum Epoch, Strategy sel. prob. psel. Table 12: Hyperparameter settings for the domain shifts. Learning rate, Weight decay, Scheduler, Batch size, Type of mixup, Architecture, Optimizer, Maximum Epoch, Strategy sel. prob. psel.