Exploring Generalization in Deep Learning

Authors: Behnam Neyshabur, Srinadh Bhojanapalli, David Mcallester, Nati Srebro

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental As an initial empirical investigation of the appropriateness of the different complexity measures, we compared the complexity (under each of the above measures) of models trained on true versus random labels. ... The results are reported in Figure 1. In this section we investigate the ability of the discussed measures to explain the the generalization phenomenon discussed in the Introduction. We already saw in Figures 1 and 2 that these measures capture the difference in generalization behavior of models trained on true or random labels, including the increase in capacity as the sample size increases, and the difference in this increase between true and random labels.
Researcher Affiliation Academia Behnam Neyshabur, Srinadh Bhojanapalli, David Mc Allester, Nathan Srebro Toyota Technological Institute at Chicago {bneyshabur, srinadh, mcallester, nati}@ttic.edu
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code No The paper does not provide any concrete statement or link regarding the availability of its source code.
Open Datasets Yes Comparing different complexity measures on a VGG network trained on subsets of CIFAR10 dataset with true (blue line) or random (red line) labels. The generalization of two layer perceptron trained on MNIST with varying number of hidden units.
Dataset Splits No The paper mentions 'training set' and 'test set' (e.g., 'evaluated the learned model on an independent test set'), but does not explicitly specify a 'validation' set or provide specific numerical details (percentages, counts, or explicit splits) for how these datasets were partitioned for reproduction.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used to conduct the experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers, such as programming languages, libraries, or frameworks used for implementation.
Experiment Setup No The paper discusses general aspects of optimization (e.g., 'simple methods such as stochastic gradient descent (SGD)', 'learning rate', 'batch sizes') and model architectures (VGG network, two layer perceptron) but does not provide concrete numerical values for hyperparameters or other detailed experimental setup information necessary for reproduction.