A Fine-Grained Analysis on Distribution Shift

Authors: Olivia Wiles, Sven Gowal, Florian Stimberg, Sylvestre-Alvise Rebuffi, Ira Ktena, Krishnamurthy Dj Dvijotham, Ali Taylan Cemgil

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide a holistic analysis of current state-of-the-art methods by evaluating 19 distinct methods grouped into five categories across both synthetic and real-world datasets. Overall, we train more than 85K models.
Researcher Affiliation Industry Deep Mind, London, UK {oawiles,sgowal,stimberg,sylvestre,iraktena,taylancemgil}@deepmind.com dvij@google.com
Pseudocode No The paper describes methods and processes in narrative text and does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes Code is available at github.com/deepmind/distribution_shift_framework.
Open Datasets Yes Datasets. We evaluate these approaches on six vision, classification datasets DSPRITES (Matthey et al., 2017), MPI3D (Gondal et al., 2019), SMALLNORB (Le Cun et al., 2004), SHAPES3D (Burgess & Kim, 2018), CAMELYON17 (Koh et al., 2020; Bandi et al., 2018), and IWILDCAM (Koh et al., 2020; Beery et al., 2018).
Dataset Splits Yes To perform model selection, we choose the best model according to the validation set which matches the distribution of the test set. In the unseen data shift setting for the CAMELYON17 and IWILDCAM, we use the given out-of-distribution validation set, which is a different, distinct set in D that is independent of Dtrain, Dtest.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiment.
Experiment Setup Yes We perform a sweep over the hyperparameters (the precise sweeps are given in appendix F.8). We run each set of hyperparameters for five seeds for each setting. To choose the best model for each seed, we perform model selection over all hyperparameters using the top-1 accuracy on the validation set.