reproducibilityindex.ai

Unreproducible Research is Reproducible

Authors: Xavier Bouthillier, César Laurent, Pascal Vincent

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide experiments to exemplify the brittleness of current common practice in the evaluation of models in the ﬁeld of deep learning, showing that even if the results could be reproduced, a slightly different experiment would not support the ﬁndings.
Researcher Affiliation	Collaboration	1Mila, Universit e de Montr eal 2Facebook AI Research 3Canadian Institute for Advanced Research (CIFAR).
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	In agreement with the solutions proposed by the community for methods reproducibility, our code is available publicly, including the data generated in this work and containers to simplify re-execution at github.com/bouthilx/repro-icml-2019
Open Datasets	Yes	we will run the seed replicates on different datasets, namely MNIST (Le Cun et al., 1998), SVHN (Netzer et al., 2011), CIFAR10, CIFAR100 (Krizhevsky & Hinton, 2009), EMNIST-balanced (Cohen et al., 2017) and Tiny Image Net (et al, 2019).
Dataset Splits	No	The paper mentions using a 'validation set' for hyperparameter optimization and 'test set' for evaluation, but it does not provide specific details on the dataset splits (e.g., exact percentages, sample counts, or explicit citation to a predefined split) used for training, validation, or testing.
Hardware Specification	No	The paper mentions receiving computational resources from 'Compute Canada and Microsoft Azure', but it does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running experiments.
Software Dependencies	No	The paper mentions frameworks like Theano, PyTorch, and TensorFlow with citations, but it does not provide specific version numbers for these or other ancillary software components used in their experiments.
Experiment Setup	Yes	For each model, we sample 10 different seeds for the pseudorandom generator used for both the initialization of the model parameters and the ordering of the data presented by the data iterator. All models are trained for 120 epochs on the same dataset. Considered hyper-parameters include the learning rate and momentum as well as weight-decay (L2 regularization strength). The hyperparameter optimization will be executed using a slightly modiﬁed version of ASHA (Li et al., 2018)2. With budgets of 15, 30, 60 and 120 epochs and a reduction factor of 4.