Sharp Minima Can Generalize For Deep Nets
Authors: Laurent Dinh, Razvan Pascanu, Samy Bengio, Yoshua Bengio
ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | This paper argues that most notions of flatness are problematic for deep models and can not be directly applied to explain generalization. Specifically, when focusing on deep networks with rectifier units, we can exploit the particular geometry of parameter space induced by the inherent symmetries that these architectures exhibit to build equivalent models corresponding to arbitrarily sharper minima. Furthermore, if we allow to reparametrize a function, the geometry of its parameters can change drastically without affecting its generalization properties. |
| Researcher Affiliation | Collaboration | 1Universit e of Montr eal, Montr eal, Canada 2Deep Mind, London, United Kingdom 3Google Brain, Mountain View, United States 4CIFAR Senior Fellow. |
| Pseudocode | No | No pseudocode or algorithm blocks are provided in the paper. The paper focuses on theoretical arguments and mathematical derivations. |
| Open Source Code | No | No statement regarding the release of open-source code for the methodology is found in the paper. |
| Open Datasets | No | The paper is theoretical and does not describe experiments using specific datasets, nor does it provide access information for any dataset it might implicitly refer to in background discussion. |
| Dataset Splits | No | The paper focuses on theoretical arguments and does not involve empirical training or validation splits of data. |
| Hardware Specification | No | The paper is theoretical and does not describe experimental setup or hardware specifications. |
| Software Dependencies | No | The paper is theoretical and does not list specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not describe an experimental setup with hyperparameter values or training configurations. |