Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Improving Out-of-Distribution Robustness via Selective Augmentation
Authors: Huaxiu Yao, Yu Wang, Sai Li, Linjun Zhang, Weixin Liang, James Zou, Chelsea Finn
ICML 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we conduct comprehensive experiments to evaluate the effectiveness of LISA. Specifically, we aim to answer the following questions: Q1: Compared to prior methods, can LISA improve robustness to subpopulation shifts and domain shifts (Section 4.1 and Section 4.2)? Q2: Which aspects of LISA are the most important for improving robustness (Section 4.3)? Q3: Does LISA successfully produce more invariant predictors (Section 4.4)? Q4: How does LISA perform with varying degrees of distribution shifts (Section 4.5)? |
| Researcher Affiliation | Academia | 1Stanford University, CA, USA 2University of California San Diego, CA, USA 3Renmin University of China, Beijing, China 4Rutgers University, NJ, USA. |
| Pseudocode | Yes | Algorithm 1 Training Procedure of LISA |
| Open Source Code | Yes | Code is released in https://github.com/huaxiuyao/LISA |
| Open Datasets | Yes | We classify MNIST digits from 2 classes... The data sizes of train, validation, and test sets are 30000, 10000, and 20000, respectively. Follow (Arjovsky et al., 2019), we flip labels with probability 0.25. |
| Dataset Splits | Yes | The data sizes of train, validation, and test sets are 30000, 10000, and 20000, respectively. |
| Hardware Specification | No | The paper mentions using 'pre-trained Res Net-50' and 'Distil BERT-uncased' as models but does not specify any hardware details like GPU models, CPU, or memory used for training or inference. |
| Software Dependencies | No | The paper mentions using 'pre-trained Res Net-50' and 'Distil BERT' architectures and frameworks like 'SGD' and 'Adam' for optimization, but it does not provide specific version numbers for any software libraries or dependencies (e.g., Python, PyTorch, TensorFlow, CUDA versions). |
| Experiment Setup | Yes | All hyperparameters are selected via cross-validation and are listed in Table 9. Table 9: Hyperparameter settings for the subpopulation shifts. Learning rate, Weight decay, Scheduler, Batch size, Type of mixup, Architecture, Optimizer, Maximum Epoch, Strategy sel. prob. psel. Table 12: Hyperparameter settings for the domain shifts. Learning rate, Weight decay, Scheduler, Batch size, Type of mixup, Architecture, Optimizer, Maximum Epoch, Strategy sel. prob. psel. |