reproducibilityindex.ai

Fantastic Generalization Measures and Where to Find Them

Authors: Yiding Jiang*, Behnam Neyshabur*, Hossein Mobahi, Dilip Krishnan, Samy Bengio

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present the ﬁrst large scale study of generalization in deep networks. We investigate more then 40 complexity measures taken from both theoretical bounds and empirical studies. We train over 10,000 convolutional networks by systematically varying commonly used hyperparameters. Hoping to uncover potentially causal relationships between each measure and generalization, we analyze carefully controlled experiments and show surprising failures of some measures as well as promising measures for further research.
Researcher Affiliation	Industry	Yiding Jiang , Behnam Neyshabur , Hossein Mobahi, Dilip Krishnan, Samy Bengio Google Research {ydjiang,neyshabur,hmobahi,dilipkay,bengio}@google.com
Pseudocode	Yes	Algorithm 1 Estimate Accuracy... Algorithm 2 Find σ for PAC-Bayesian Bound... Algorithm 3 Find σ for Sharpness Bound
Open Source Code	No	The paper does not provide a direct link to the source code for the methodology or explicitly state that the code is publicly released.
Open Datasets	Yes	In this study, we trained more than 10,000 models over two image classiﬁcation datasets, namely, CIFAR-10 (Krizhevsky et al., 2014) and Street View House Numbers (SVHN) Netzer et al. (2011).
Dataset Splits	No	The paper focuses on the 'generalization gap' (test error - train error) and mentions training and test sets but does not specify train/validation/test splits (e.g., percentages or sample counts) for its experiments.
Hardware Specification	No	The paper mentions training 'over 10,000 convolutional networks' but does not specify any details about the hardware used for these experiments, such as GPU/CPU models, memory, or specific cloud resources.
Software Dependencies	No	The paper mentions using Batch Normalization and different optimizers (Momentum SGD, Adam, RMSProp), but it does not specify any software frameworks (e.g., PyTorch, TensorFlow) or library versions used for the implementation or experiments.
Experiment Setup	Yes	We chose 7 common hyperparameter types related to optimization and architecture design, with 3 choices for each hyperparameter... The hyperparameter categories we test on are: weight decay coefﬁcient (weight decay), width of the layer (width), mini-batch size (batch size), learning rate (learning rate), dropout probability (dropout), depth of the architecture (depth) and the choice of the optimization algorithms (optimizer). We select 3 choices for each hyperparameter (i.e. \|Θi\| = 3). Please refer to Appendix C.3 for the details on the models, and Appendix C.1 for the reasoning behind the design choices.