Unreproducible Research is Reproducible
Authors: Xavier Bouthillier, César Laurent, Pascal Vincent
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide experiments to exemplify the brittleness of current common practice in the evaluation of models in the field of deep learning, showing that even if the results could be reproduced, a slightly different experiment would not support the findings. |
| Researcher Affiliation | Collaboration | 1Mila, Universit e de Montr eal 2Facebook AI Research 3Canadian Institute for Advanced Research (CIFAR). |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | In agreement with the solutions proposed by the community for methods reproducibility, our code is available publicly, including the data generated in this work and containers to simplify re-execution at github.com/bouthilx/repro-icml-2019 |
| Open Datasets | Yes | we will run the seed replicates on different datasets, namely MNIST (Le Cun et al., 1998), SVHN (Netzer et al., 2011), CIFAR10, CIFAR100 (Krizhevsky & Hinton, 2009), EMNIST-balanced (Cohen et al., 2017) and Tiny Image Net (et al, 2019). |
| Dataset Splits | No | The paper mentions using a 'validation set' for hyperparameter optimization and 'test set' for evaluation, but it does not provide specific details on the dataset splits (e.g., exact percentages, sample counts, or explicit citation to a predefined split) used for training, validation, or testing. |
| Hardware Specification | No | The paper mentions receiving computational resources from 'Compute Canada and Microsoft Azure', but it does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running experiments. |
| Software Dependencies | No | The paper mentions frameworks like Theano, PyTorch, and TensorFlow with citations, but it does not provide specific version numbers for these or other ancillary software components used in their experiments. |
| Experiment Setup | Yes | For each model, we sample 10 different seeds for the pseudorandom generator used for both the initialization of the model parameters and the ordering of the data presented by the data iterator. All models are trained for 120 epochs on the same dataset. Considered hyper-parameters include the learning rate and momentum as well as weight-decay (L2 regularization strength). The hyperparameter optimization will be executed using a slightly modified version of ASHA (Li et al., 2018)2. With budgets of 15, 30, 60 and 120 epochs and a reduction factor of 4. |