reproducibilityindex.ai

Convergence Analysis of Two-layer Neural Networks with ReLU Activation

Authors: Yuanzhi Li, Yang Yuan

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To complement our theory, we are also able to show experimentally that multi-layer networks with this mapping have better performance compared with normal vanilla networks. Our convergence theorem differs from traditional non-convex optimization techniques. We show that SGD converges to optimal in two phases : In phase I, the gradient points to the wrong direction, however, a potential function g gradually decreases. Then in phase II, SGD enters a nice one point convex region and converges. We also show that the identity mapping is necessary for convergence, as it moves the initial point to a better place for optimization. Experiment veriﬁes our claims.
Researcher Affiliation	Academia	Yuanzhi Li Computer Science Department Princeton University yuanzhil@cs.princeton.edu Yang Yuan Computer Science Department Cornell University yangyuan@cs.cornell.edu
Pseudocode	No	No structured pseudocode or algorithm blocks were found in the paper.
Open Source Code	Yes	Our code can be found in the supplementary materials.
Open Datasets	Yes	In this experiment, we choose Cifar-10 as the dataset, and all the networks have 56-layers.
Dataset Splits	No	The paper mentions 'training set of size 100,000, and test set of size 10,000' in Section 5.2 but does not explicitly detail a validation split or its size.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) were mentioned for running experiments.
Software Dependencies	No	No specific software dependencies with version numbers (e.g., 'Python 3.8', 'PyTorch 1.9') were found in the paper.
Experiment Setup	Yes	We use batch size 200, step size 0.001. We run Res Link for 5 times with random initialization ( W 2 0.6 and W F 5), and plot the curves by taking the average.