Unreproducible Research is Reproducible

Authors: Xavier Bouthillier, César Laurent, Pascal Vincent

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide experiments to exemplify the brittleness of current common practice in the evaluation of models in the field of deep learning, showing that even if the results could be reproduced, a slightly different experiment would not support the findings.
Researcher Affiliation Collaboration 1Mila, Universit e de Montr eal 2Facebook AI Research 3Canadian Institute for Advanced Research (CIFAR).
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes In agreement with the solutions proposed by the community for methods reproducibility, our code is available publicly, including the data generated in this work and containers to simplify re-execution at github.com/bouthilx/repro-icml-2019
Open Datasets Yes we will run the seed replicates on different datasets, namely MNIST (Le Cun et al., 1998), SVHN (Netzer et al., 2011), CIFAR10, CIFAR100 (Krizhevsky & Hinton, 2009), EMNIST-balanced (Cohen et al., 2017) and Tiny Image Net (et al, 2019).
Dataset Splits No The paper mentions using a 'validation set' for hyperparameter optimization and 'test set' for evaluation, but it does not provide specific details on the dataset splits (e.g., exact percentages, sample counts, or explicit citation to a predefined split) used for training, validation, or testing.
Hardware Specification No The paper mentions receiving computational resources from 'Compute Canada and Microsoft Azure', but it does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running experiments.
Software Dependencies No The paper mentions frameworks like Theano, PyTorch, and TensorFlow with citations, but it does not provide specific version numbers for these or other ancillary software components used in their experiments.
Experiment Setup Yes For each model, we sample 10 different seeds for the pseudorandom generator used for both the initialization of the model parameters and the ordering of the data presented by the data iterator. All models are trained for 120 epochs on the same dataset. Considered hyper-parameters include the learning rate and momentum as well as weight-decay (L2 regularization strength). The hyperparameter optimization will be executed using a slightly modified version of ASHA (Li et al., 2018)2. With budgets of 15, 30, 60 and 120 epochs and a reduction factor of 4.