Gradient Descent Finds Global Minima of Deep Neural Networks
Authors: Simon Du, Jason Lee, Haochuan Li, Liwei Wang, Xiyu Zhai
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | The current paper proves gradient descent achieves zero training loss in polynomial time for a deep overparameterized neural network with residual connections (Res Net). Our analysis relies on the particular structure of the Gram matrix induced by the neural network architecture. |
| Researcher Affiliation | Academia | 1Machine Learning Department, Carnegie Mellon University 2Data Science and Operations Department, University of Southern California 3School of Physics, Peking University 4Center for Data Science, Peking University, Beijing Institute of Big Data Research 5Key Laboratory of Machine Perception, MOE, School of EECS, Peking University 6Department of EECS, Massachusetts Institute of Technology. |
| Pseudocode | No | The paper describes mathematical derivations and update rules for gradient descent but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statement or link indicating the availability of open-source code for the described methodology. |
| Open Datasets | No | The paper describes a theoretical problem setup with 'n data points' and 'training inputs {xi}', but it does not specify a publicly available dataset with concrete access information for empirical training. |
| Dataset Splits | No | The paper does not provide specific dataset split information for validation. |
| Hardware Specification | No | The paper does not provide specific hardware details used for running experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers. |
| Experiment Setup | No | The paper discusses theoretical conditions for convergence, such as step size and network width, but does not provide specific experimental setup details like concrete hyperparameter values or training configurations for empirical experiments. |