Understanding deep learning requires rethinking generalization
Authors: Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, Oriol Vinyals
ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive systematic experiments, we show how these traditional approaches fail to explain why large neural networks generalize well in practice. Specifically, our experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a random labeling of the training data. |
| Researcher Affiliation | Collaboration | Chiyuan Zhang Massachusetts Institute of Technology chiyuan@mit.edu Samy Bengio Google Brain bengio@google.com Moritz Hardt Google Brain mrtz@google.com Benjamin Recht University of California, Berkeley brecht@berkeley.edu Oriol Vinyals Google Deep Mind vinyals@google.com |
| Pseudocode | No | No pseudocode or clearly labeled algorithm block found in the paper. |
| Open Source Code | No | The paper references "TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. URL http://tensorflow.org/. Software available from tensorflow.org." This refers to a third-party tool used, not the authors' own source code for their methodology. |
| Open Datasets | Yes | The experiments are run on two image classification datasets, the CIFAR10 dataset (Krizhevsky & Hinton, 2009) and the Image Net (Russakovsky et al., 2015) ILSVRC 2012 dataset. |
| Dataset Splits | Yes | The CIFAR10 dataset contains 50,000 training and 10,000 validation images, split into 10 classes. The Image Net dataset contains 1,281,167 training and 50,000 validation images, split into 1000 classes. |
| Hardware Specification | No | The paper mentions "We run the Image Net experiment in a distributed asynchronized SGD system with 50 workers," but provides no specific details about the hardware used (e.g., GPU/CPU models, memory, cloud instance types). |
| Software Dependencies | No | The paper mentions "TENSORFLOW (Abadi et al., 2015)" but does not specify a version number for it or any other software dependency. |
| Experiment Setup | Yes | For all experiments on CIFAR10, we train using SGD with a momentum parameter of 0.9. An initial learning rate of 0.1 (for small Inception) or 0.01 (for small Alexnet and MLPs) are used, with a decay factor of 0.95 per training epoch. Unless otherwise specified, for the experiments with randomized labels or pixels, we train the networks without weight decay, dropout, or other forms of explicit regularization. Section 3 discusses the effects of various regularizers on fitting the networks and generalization. |