reproducibilityindex.ai

Width Provably Matters in Optimization for Deep Linear Neural Networks

Authors: Simon Du, Wei Hu

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We prove that for an L-layer fully-connected linear neural network, if the width of every hidden layer is e L r dout 3#, where r and are the rank and the condition number of the input data, and dout is the output dimension, then gradient descent with Gaussian random initialization converges to a global minimum at a linear rate. The number of iterations to ﬁnd an -suboptimal solution is O( log( 1 )). Our polynomial upper bound on the total running time for wide deep linear networks and the exp ( (L)) lower bound for narrow deep linear neural networks [Shamir, 2018] together demonstrate that wide layers are necessary for optimizing deep models.
Researcher Affiliation	Academia	1Carnegie Mellon University, Pittsburgh, PA, USA 2Princeton University, Princeton, NJ, USA.
Pseudocode	No	The paper contains mathematical derivations and theoretical analyses but does not include any pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any statements about releasing code or providing a link to source code.
Open Datasets	No	The paper is purely theoretical and does not use or reference any specific datasets for training experiments. It refers to 'input data X' as a mathematical construct.
Dataset Splits	No	The paper is theoretical and does not describe any dataset splits for validation or other purposes, as it does not conduct experiments.
Hardware Specification	No	The paper is theoretical and does not describe any hardware specifications used for experiments.
Software Dependencies	No	The paper is theoretical and does not list any software dependencies with specific version numbers, as it does not describe experimental implementations.
Experiment Setup	No	The paper is theoretical and does not describe any specific experimental setup details such as hyperparameters or training settings, as it does not conduct experiments.