reproducibilityindex.ai

A Fine-Grained Analysis on Distribution Shift

Authors: Olivia Wiles, Sven Gowal, Florian Stimberg, Sylvestre-Alvise Rebuffi, Ira Ktena, Krishnamurthy Dj Dvijotham, Ali Taylan Cemgil

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide a holistic analysis of current state-of-the-art methods by evaluating 19 distinct methods grouped into ﬁve categories across both synthetic and real-world datasets. Overall, we train more than 85K models.
Researcher Affiliation	Industry	Deep Mind, London, UK {oawiles,sgowal,stimberg,sylvestre,iraktena,taylancemgil}@deepmind.com dvij@google.com
Pseudocode	No	The paper describes methods and processes in narrative text and does not include any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code is available at github.com/deepmind/distribution_shift_framework.
Open Datasets	Yes	Datasets. We evaluate these approaches on six vision, classiﬁcation datasets DSPRITES (Matthey et al., 2017), MPI3D (Gondal et al., 2019), SMALLNORB (Le Cun et al., 2004), SHAPES3D (Burgess & Kim, 2018), CAMELYON17 (Koh et al., 2020; Bandi et al., 2018), and IWILDCAM (Koh et al., 2020; Beery et al., 2018).
Dataset Splits	Yes	To perform model selection, we choose the best model according to the validation set which matches the distribution of the test set. In the unseen data shift setting for the CAMELYON17 and IWILDCAM, we use the given out-of-distribution validation set, which is a different, distinct set in D that is independent of Dtrain, Dtest.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiment.
Experiment Setup	Yes	We perform a sweep over the hyperparameters (the precise sweeps are given in appendix F.8). We run each set of hyperparameters for ﬁve seeds for each setting. To choose the best model for each seed, we perform model selection over all hyperparameters using the top-1 accuracy on the validation set.