Sharp Minima Can Generalize For Deep Nets

Authors: Laurent Dinh, Razvan Pascanu, Samy Bengio, Yoshua Bengio

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical This paper argues that most notions of flatness are problematic for deep models and can not be directly applied to explain generalization. Specifically, when focusing on deep networks with rectifier units, we can exploit the particular geometry of parameter space induced by the inherent symmetries that these architectures exhibit to build equivalent models corresponding to arbitrarily sharper minima. Furthermore, if we allow to reparametrize a function, the geometry of its parameters can change drastically without affecting its generalization properties.
Researcher Affiliation Collaboration 1Universit e of Montr eal, Montr eal, Canada 2Deep Mind, London, United Kingdom 3Google Brain, Mountain View, United States 4CIFAR Senior Fellow.
Pseudocode No No pseudocode or algorithm blocks are provided in the paper. The paper focuses on theoretical arguments and mathematical derivations.
Open Source Code No No statement regarding the release of open-source code for the methodology is found in the paper.
Open Datasets No The paper is theoretical and does not describe experiments using specific datasets, nor does it provide access information for any dataset it might implicitly refer to in background discussion.
Dataset Splits No The paper focuses on theoretical arguments and does not involve empirical training or validation splits of data.
Hardware Specification No The paper is theoretical and does not describe experimental setup or hardware specifications.
Software Dependencies No The paper is theoretical and does not list specific software dependencies with version numbers.
Experiment Setup No The paper is theoretical and does not describe an experimental setup with hyperparameter values or training configurations.