Out-of-Domain Robustness via Targeted Augmentations
Authors: Irena Gao, Shiori Sagawa, Pang Wei Koh, Tatsunori Hashimoto, Percy Liang
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We prove that targeted augmentations improve OOD performance, allowing models to generalize better with fewer domains. In experiments on three realworld datasets, we show that targeted augmentations set new states-of-the-art for OOD performance by 3.2 15.2%. |
| Researcher Affiliation | Collaboration | 1Stanford University 2University of Washington 3Google Brain. |
| Pseudocode | Yes | Algorithm 1 Copy-Paste and Algorithm 2 Stain Color Jitter Augmentation. |
| Open Source Code | Yes | Code annd BIRDCALLS are released at this link. |
| Open Datasets | Yes | Empirically, we show targeted augmentations are effective on three real-world datasets spanning biomedical and wildlife monitoring applications: CAMELYON17-WILDS (Bandi et al., 2018; Koh et al., 2021), IWILDCAM2020-WILDS (Beery et al., 2021; Koh et al., 2021), and BIRDCALLS, which we curate from ornithology datasets (Navine et al., 2022; Hopping et al., 2022; Kahl et al., 2022). We release BIRDCALLS at this link. |
| Dataset Splits | Yes | In real-world experiments and simulations, we estimate OOD performance by evaluating on held-out domains Dtest, where Dtest Dtrain = . Model selection and early stopping was done on the OOD validation split of i Wild Cam, which measures performance on a heldout set of cameras Dval, which is disjoint from both Dtrain and Dtest. |
| Hardware Specification | No | The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running experiments. |
| Software Dependencies | No | The paper mentions software components like 'Efficient Net B0' and 'PyTorch' (implicitly via citations to papers using PyTorch), but does not provide specific version numbers for these or other ancillary software. |
| Experiment Setup | Yes | All experiments used a Res Net-50, pretrained on Image Net, with no weight decay and batch size 24, following Sagawa et al. (2021); Koh et al. (2021). Model selection and early stopping was done on the OOD validation split of i Wild Cam, which measures performance on a heldout set of cameras Dval, which is disjoint from both Dtrain and Dtest. We tuned all methods by fixing a budget of 10 tuning runs per method with one replicate each; the hyperparameter grids are given in Table 8. |