Gradient Descent Finds Global Minima of Deep Neural Networks

Authors: Simon Du, Jason Lee, Haochuan Li, Liwei Wang, Xiyu Zhai

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical The current paper proves gradient descent achieves zero training loss in polynomial time for a deep overparameterized neural network with residual connections (Res Net). Our analysis relies on the particular structure of the Gram matrix induced by the neural network architecture.
Researcher Affiliation Academia 1Machine Learning Department, Carnegie Mellon University 2Data Science and Operations Department, University of Southern California 3School of Physics, Peking University 4Center for Data Science, Peking University, Beijing Institute of Big Data Research 5Key Laboratory of Machine Perception, MOE, School of EECS, Peking University 6Department of EECS, Massachusetts Institute of Technology.
Pseudocode No The paper describes mathematical derivations and update rules for gradient descent but does not include any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statement or link indicating the availability of open-source code for the described methodology.
Open Datasets No The paper describes a theoretical problem setup with 'n data points' and 'training inputs {xi}', but it does not specify a publicly available dataset with concrete access information for empirical training.
Dataset Splits No The paper does not provide specific dataset split information for validation.
Hardware Specification No The paper does not provide specific hardware details used for running experiments.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers.
Experiment Setup No The paper discusses theoretical conditions for convergence, such as step size and network width, but does not provide specific experimental setup details like concrete hyperparameter values or training configurations for empirical experiments.