reproducibilityindex.ai

Deep Learning without Poor Local Minima

Authors: Kenji Kawaguchi

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	In this paper, we prove a conjecture published in 1989 and also partially address an open problem announced at the Conference on Learning Theory (COLT) 2015. With no unrealistic assumption, we first prove the following statements for the squared loss function of deep linear neural networks with any depth and any widths: 1) the function is non-convex and non-concave, 2) every local minimum is a global minimum, 3) every critical point that is not a global minimum is a saddle point, and 4) there exist bad saddle points (where the Hessian has no negative eigenvalue) for the deeper networks (with more than three layers), whereas there is no bad saddle point for the shallow networks (with three layers). Moreover, for deep nonlinear neural networks, we prove the same four statements via a reduction to a deep linear model under the independence assumption adopted from recent work. As a result, we present an instance, for which we can answer the following question: how difficult is it to directly train a deep model in theory? It is more difficult than the classical machine learning models (because of the non-convexity), but not too difficult (because of the nonexistence of poor local minima). Furthermore, the mathematically proven existence of bad saddle points for deeper models would suggest a possible open problem. We note that even though we have advanced the theoretical foundations of deep learning and non-convex optimization, there is still a gap between theory and practice.
Researcher Affiliation	Academia	Kenji Kawaguchi Massachusetts Institute of Technology kawaguch@mit.edu
Pseudocode	No	The paper focuses on theoretical proofs and mathematical derivations. It does not include any pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any explicit statements about releasing open-source code or links to a code repository.
Open Datasets	No	This paper is theoretical and does not involve experimental data or datasets. Therefore, there is no information about publicly available training data.
Dataset Splits	No	This paper is theoretical and does not involve experimental data or datasets. Therefore, there is no information about training/test/validation dataset splits.
Hardware Specification	No	This paper is theoretical and does not describe any experiments. Therefore, no hardware specifications are provided.
Software Dependencies	No	This paper is theoretical and does not describe any experiments or implementations that would require specific software dependencies with version numbers.
Experiment Setup	No	This paper is theoretical and does not describe any experimental setup, hyperparameters, or training configurations.