reproducibilityindex.ai

Out-of-Domain Robustness via Targeted Augmentations

Authors: Irena Gao, Shiori Sagawa, Pang Wei Koh, Tatsunori Hashimoto, Percy Liang

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We prove that targeted augmentations improve OOD performance, allowing models to generalize better with fewer domains. In experiments on three realworld datasets, we show that targeted augmentations set new states-of-the-art for OOD performance by 3.2 15.2%.
Researcher Affiliation	Collaboration	1Stanford University 2University of Washington 3Google Brain.
Pseudocode	Yes	Algorithm 1 Copy-Paste and Algorithm 2 Stain Color Jitter Augmentation.
Open Source Code	Yes	Code annd BIRDCALLS are released at this link.
Open Datasets	Yes	Empirically, we show targeted augmentations are effective on three real-world datasets spanning biomedical and wildlife monitoring applications: CAMELYON17-WILDS (Bandi et al., 2018; Koh et al., 2021), IWILDCAM2020-WILDS (Beery et al., 2021; Koh et al., 2021), and BIRDCALLS, which we curate from ornithology datasets (Navine et al., 2022; Hopping et al., 2022; Kahl et al., 2022). We release BIRDCALLS at this link.
Dataset Splits	Yes	In real-world experiments and simulations, we estimate OOD performance by evaluating on held-out domains Dtest, where Dtest Dtrain = . Model selection and early stopping was done on the OOD validation split of i Wild Cam, which measures performance on a heldout set of cameras Dval, which is disjoint from both Dtrain and Dtest.
Hardware Specification	No	The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running experiments.
Software Dependencies	No	The paper mentions software components like 'Efficient Net B0' and 'PyTorch' (implicitly via citations to papers using PyTorch), but does not provide specific version numbers for these or other ancillary software.
Experiment Setup	Yes	All experiments used a Res Net-50, pretrained on Image Net, with no weight decay and batch size 24, following Sagawa et al. (2021); Koh et al. (2021). Model selection and early stopping was done on the OOD validation split of i Wild Cam, which measures performance on a heldout set of cameras Dval, which is disjoint from both Dtrain and Dtest. We tuned all methods by fixing a budget of 10 tuning runs per method with one replicate each; the hyperparameter grids are given in Table 8.