reproducibilityindex.ai

An Improved Analysis of Training Over-parameterized Deep Neural Networks

Authors: Difan Zou, Quanquan Gu

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	A recent line of research has shown that gradient-based algorithms with random initialization can converge to the global minima of the training loss for overparameterized (i.e., sufﬁciently wide) deep neural networks. However, the condition on the width of the neural network to ensure the global convergence is very stringent, which is often a high-degree polynomial in the training sample size n (e.g., O(n24)). In this paper, we provide an improved analysis of the global convergence of (stochastic) gradient descent for training deep neural networks, which only requires a milder over-parameterization condition than previous work in terms of the training sample size and other problem-dependent parameters. The main technical contributions of our analysis include (a) a tighter gradient lower bound that leads to a faster convergence of the algorithm, and (b) a sharper characterization of the trajectory length of the algorithm.
Researcher Affiliation	Academia	Difan Zou Department of Computer Science University of California, Los Angeles Los Angeles, CA 90095 knowzou@cs.ucla.edu Quanquan Gu Department of Computer Science University of California, Los Angeles Los Angeles, CA 90095 qgu@cs.ucla.edu
Pseudocode	Yes	Algorithm 1 (Stochastic) Gradient descent with Gaussian random initialization
Open Source Code	No	The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	No	The paper defines properties of "training data points" (e.g., "For any xi, it holds that xi 2 = 1 and (xi)d = µ" in Assumption 3.1) and refers to the "training sample size n." However, it does not use or provide access information for any specific named public datasets (e.g., MNIST, CIFAR-10) with citations or links.
Dataset Splits	No	The paper does not provide specific details on training, validation, or test dataset splits. The research is theoretical and does not involve empirical data splitting for experiments.
Hardware Specification	No	The paper is theoretical and does not discuss any specific hardware used for running experiments.
Software Dependencies	No	The paper is theoretical and does not list any specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions) that would be needed for replication.
Experiment Setup	No	The paper defines the "step size η" within its theoretical algorithm description. However, it does not provide concrete hyperparameter values, training configurations, or system-level settings typically found in an experimental setup section for empirical evaluation.