reproducibilityindex.ai

Global Convergence of Gradient Descent for Deep Linear Residual Networks

Authors: Lei Wu, Qingcan Wang, Chao Ma

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We numerically compare the gradient descent dynamics between the ZAS and the near-identity initialization for multi-dimensional deep linear networks. The comparison clearly shows that the convergence of gradient descent with the near-identity initialization involves a saddle point escape process, while the ZAS initialization never encounters any saddle point during the whole optimization process. We provide an extension of the ZAS initialization to the nonlinear case. Moreover, the numerical experiments justify its superiority compared to the standard initializations.
Researcher Affiliation	Academia	Lei Wu Qingcan Wang Chao Ma Program in Applied and Computational Mathematics Princeton University Princeton, NJ 08544, USA {leiwu,qingcanw,chaom}@princeton.edu
Pseudocode	No	No structured pseudocode or algorithm blocks were found in the paper.
Open Source Code	No	The paper does not provide an explicit statement about releasing the source code or a direct link to a code repository for the methodology described.
Open Datasets	Yes	The experiments are conducted on Fashion-MNIST [20], where we select 1000 training samples forming the new training set to speed up the computation.
Dataset Splits	No	The paper mentions using a 'training set' and refers to testing, but does not provide specific details about any validation set splits (e.g., percentages, sample counts, or explicit references to predefined validation splits).
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments were provided in the paper.
Software Dependencies	No	No specific software dependencies with version numbers were mentioned in the paper.
Experiment Setup	Yes	We manually tune the optimal learning rate for each L. ... The learning rate η = 0.01 for both initialization. ... Depth L = 100, 200, 2000, 10000 are tested, and the learning rate for each depth is tuned to the achieve the fastest convergence. ... L=100, lr=1e-1, m ZAS L=200, lr=1e-1, m ZAS L=2000, lr=2e-2, m ZAS L=10000, lr=2e-3, m ZAS L=100, lr=1e-3, Xavier L=200, lr=1e-6, Xavier