Out-of-Domain Robustness via Targeted Augmentations

Authors: Irena Gao, Shiori Sagawa, Pang Wei Koh, Tatsunori Hashimoto, Percy Liang

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We prove that targeted augmentations improve OOD performance, allowing models to generalize better with fewer domains. In experiments on three realworld datasets, we show that targeted augmentations set new states-of-the-art for OOD performance by 3.2 15.2%.
Researcher Affiliation Collaboration 1Stanford University 2University of Washington 3Google Brain.
Pseudocode Yes Algorithm 1 Copy-Paste and Algorithm 2 Stain Color Jitter Augmentation.
Open Source Code Yes Code annd BIRDCALLS are released at this link.
Open Datasets Yes Empirically, we show targeted augmentations are effective on three real-world datasets spanning biomedical and wildlife monitoring applications: CAMELYON17-WILDS (Bandi et al., 2018; Koh et al., 2021), IWILDCAM2020-WILDS (Beery et al., 2021; Koh et al., 2021), and BIRDCALLS, which we curate from ornithology datasets (Navine et al., 2022; Hopping et al., 2022; Kahl et al., 2022). We release BIRDCALLS at this link.
Dataset Splits Yes In real-world experiments and simulations, we estimate OOD performance by evaluating on held-out domains Dtest, where Dtest Dtrain = . Model selection and early stopping was done on the OOD validation split of i Wild Cam, which measures performance on a heldout set of cameras Dval, which is disjoint from both Dtrain and Dtest.
Hardware Specification No The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running experiments.
Software Dependencies No The paper mentions software components like 'Efficient Net B0' and 'PyTorch' (implicitly via citations to papers using PyTorch), but does not provide specific version numbers for these or other ancillary software.
Experiment Setup Yes All experiments used a Res Net-50, pretrained on Image Net, with no weight decay and batch size 24, following Sagawa et al. (2021); Koh et al. (2021). Model selection and early stopping was done on the OOD validation split of i Wild Cam, which measures performance on a heldout set of cameras Dval, which is disjoint from both Dtrain and Dtest. We tuned all methods by fixing a budget of 10 tuning runs per method with one replicate each; the hyperparameter grids are given in Table 8.