Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Improving Out-of-Distribution Robustness via Selective Augmentation

Authors: Huaxiu Yao, Yu Wang, Sai Li, Linjun Zhang, Weixin Liang, James Zou, Chelsea Finn

ICML 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we conduct comprehensive experiments to evaluate the effectiveness of LISA. Specifically, we aim to answer the following questions: Q1: Compared to prior methods, can LISA improve robustness to subpopulation shifts and domain shifts (Section 4.1 and Section 4.2)? Q2: Which aspects of LISA are the most important for improving robustness (Section 4.3)? Q3: Does LISA successfully produce more invariant predictors (Section 4.4)? Q4: How does LISA perform with varying degrees of distribution shifts (Section 4.5)?
Researcher Affiliation	Academia	1Stanford University, CA, USA 2University of California San Diego, CA, USA 3Renmin University of China, Beijing, China 4Rutgers University, NJ, USA.
Pseudocode	Yes	Algorithm 1 Training Procedure of LISA
Open Source Code	Yes	Code is released in https://github.com/huaxiuyao/LISA
Open Datasets	Yes	We classify MNIST digits from 2 classes... The data sizes of train, validation, and test sets are 30000, 10000, and 20000, respectively. Follow (Arjovsky et al., 2019), we flip labels with probability 0.25.
Dataset Splits	Yes	The data sizes of train, validation, and test sets are 30000, 10000, and 20000, respectively.
Hardware Specification	No	The paper mentions using 'pre-trained Res Net-50' and 'Distil BERT-uncased' as models but does not specify any hardware details like GPU models, CPU, or memory used for training or inference.
Software Dependencies	No	The paper mentions using 'pre-trained Res Net-50' and 'Distil BERT' architectures and frameworks like 'SGD' and 'Adam' for optimization, but it does not provide specific version numbers for any software libraries or dependencies (e.g., Python, PyTorch, TensorFlow, CUDA versions).
Experiment Setup	Yes	All hyperparameters are selected via cross-validation and are listed in Table 9. Table 9: Hyperparameter settings for the subpopulation shifts. Learning rate, Weight decay, Scheduler, Batch size, Type of mixup, Architecture, Optimizer, Maximum Epoch, Strategy sel. prob. psel. Table 12: Hyperparameter settings for the domain shifts. Learning rate, Weight decay, Scheduler, Batch size, Type of mixup, Architecture, Optimizer, Maximum Epoch, Strategy sel. prob. psel.