Fantastic Generalization Measures and Where to Find Them
Authors: Yiding Jiang*, Behnam Neyshabur*, Hossein Mobahi, Dilip Krishnan, Samy Bengio
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present the first large scale study of generalization in deep networks. We investigate more then 40 complexity measures taken from both theoretical bounds and empirical studies. We train over 10,000 convolutional networks by systematically varying commonly used hyperparameters. Hoping to uncover potentially causal relationships between each measure and generalization, we analyze carefully controlled experiments and show surprising failures of some measures as well as promising measures for further research. |
| Researcher Affiliation | Industry | Yiding Jiang , Behnam Neyshabur , Hossein Mobahi, Dilip Krishnan, Samy Bengio Google Research {ydjiang,neyshabur,hmobahi,dilipkay,bengio}@google.com |
| Pseudocode | Yes | Algorithm 1 Estimate Accuracy... Algorithm 2 Find σ for PAC-Bayesian Bound... Algorithm 3 Find σ for Sharpness Bound |
| Open Source Code | No | The paper does not provide a direct link to the source code for the methodology or explicitly state that the code is publicly released. |
| Open Datasets | Yes | In this study, we trained more than 10,000 models over two image classification datasets, namely, CIFAR-10 (Krizhevsky et al., 2014) and Street View House Numbers (SVHN) Netzer et al. (2011). |
| Dataset Splits | No | The paper focuses on the 'generalization gap' (test error - train error) and mentions training and test sets but does not specify train/validation/test splits (e.g., percentages or sample counts) for its experiments. |
| Hardware Specification | No | The paper mentions training 'over 10,000 convolutional networks' but does not specify any details about the hardware used for these experiments, such as GPU/CPU models, memory, or specific cloud resources. |
| Software Dependencies | No | The paper mentions using Batch Normalization and different optimizers (Momentum SGD, Adam, RMSProp), but it does not specify any software frameworks (e.g., PyTorch, TensorFlow) or library versions used for the implementation or experiments. |
| Experiment Setup | Yes | We chose 7 common hyperparameter types related to optimization and architecture design, with 3 choices for each hyperparameter... The hyperparameter categories we test on are: weight decay coefficient (weight decay), width of the layer (width), mini-batch size (batch size), learning rate (learning rate), dropout probability (dropout), depth of the architecture (depth) and the choice of the optimization algorithms (optimizer). We select 3 choices for each hyperparameter (i.e. |Θi| = 3). Please refer to Appendix C.3 for the details on the models, and Appendix C.1 for the reasoning behind the design choices. |