reproducibilityindex.ai

Understanding deep learning requires rethinking generalization

Authors: Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, Oriol Vinyals

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive systematic experiments, we show how these traditional approaches fail to explain why large neural networks generalize well in practice. Speciﬁcally, our experiments establish that state-of-the-art convolutional networks for image classiﬁcation trained with stochastic gradient methods easily ﬁt a random labeling of the training data.
Researcher Affiliation	Collaboration	Chiyuan Zhang Massachusetts Institute of Technology chiyuan@mit.edu Samy Bengio Google Brain bengio@google.com Moritz Hardt Google Brain mrtz@google.com Benjamin Recht University of California, Berkeley brecht@berkeley.edu Oriol Vinyals Google Deep Mind vinyals@google.com
Pseudocode	No	No pseudocode or clearly labeled algorithm block found in the paper.
Open Source Code	No	The paper references "TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. URL http://tensorflow.org/. Software available from tensorﬂow.org." This refers to a third-party tool used, not the authors' own source code for their methodology.
Open Datasets	Yes	The experiments are run on two image classiﬁcation datasets, the CIFAR10 dataset (Krizhevsky & Hinton, 2009) and the Image Net (Russakovsky et al., 2015) ILSVRC 2012 dataset.
Dataset Splits	Yes	The CIFAR10 dataset contains 50,000 training and 10,000 validation images, split into 10 classes. The Image Net dataset contains 1,281,167 training and 50,000 validation images, split into 1000 classes.
Hardware Specification	No	The paper mentions "We run the Image Net experiment in a distributed asynchronized SGD system with 50 workers," but provides no specific details about the hardware used (e.g., GPU/CPU models, memory, cloud instance types).
Software Dependencies	No	The paper mentions "TENSORFLOW (Abadi et al., 2015)" but does not specify a version number for it or any other software dependency.
Experiment Setup	Yes	For all experiments on CIFAR10, we train using SGD with a momentum parameter of 0.9. An initial learning rate of 0.1 (for small Inception) or 0.01 (for small Alexnet and MLPs) are used, with a decay factor of 0.95 per training epoch. Unless otherwise speciﬁed, for the experiments with randomized labels or pixels, we train the networks without weight decay, dropout, or other forms of explicit regularization. Section 3 discusses the effects of various regularizers on ﬁtting the networks and generalization.