A Fine-Grained Analysis on Distribution Shift
Authors: Olivia Wiles, Sven Gowal, Florian Stimberg, Sylvestre-Alvise Rebuffi, Ira Ktena, Krishnamurthy Dj Dvijotham, Ali Taylan Cemgil
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide a holistic analysis of current state-of-the-art methods by evaluating 19 distinct methods grouped into five categories across both synthetic and real-world datasets. Overall, we train more than 85K models. |
| Researcher Affiliation | Industry | Deep Mind, London, UK {oawiles,sgowal,stimberg,sylvestre,iraktena,taylancemgil}@deepmind.com dvij@google.com |
| Pseudocode | No | The paper describes methods and processes in narrative text and does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at github.com/deepmind/distribution_shift_framework. |
| Open Datasets | Yes | Datasets. We evaluate these approaches on six vision, classification datasets DSPRITES (Matthey et al., 2017), MPI3D (Gondal et al., 2019), SMALLNORB (Le Cun et al., 2004), SHAPES3D (Burgess & Kim, 2018), CAMELYON17 (Koh et al., 2020; Bandi et al., 2018), and IWILDCAM (Koh et al., 2020; Beery et al., 2018). |
| Dataset Splits | Yes | To perform model selection, we choose the best model according to the validation set which matches the distribution of the test set. In the unseen data shift setting for the CAMELYON17 and IWILDCAM, we use the given out-of-distribution validation set, which is a different, distinct set in D that is independent of Dtrain, Dtest. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiment. |
| Experiment Setup | Yes | We perform a sweep over the hyperparameters (the precise sweeps are given in appendix F.8). We run each set of hyperparameters for five seeds for each setting. To choose the best model for each seed, we perform model selection over all hyperparameters using the top-1 accuracy on the validation set. |