Convergence Analysis of Two-layer Neural Networks with ReLU Activation

Authors: Yuanzhi Li, Yang Yuan

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To complement our theory, we are also able to show experimentally that multi-layer networks with this mapping have better performance compared with normal vanilla networks. Our convergence theorem differs from traditional non-convex optimization techniques. We show that SGD converges to optimal in two phases : In phase I, the gradient points to the wrong direction, however, a potential function g gradually decreases. Then in phase II, SGD enters a nice one point convex region and converges. We also show that the identity mapping is necessary for convergence, as it moves the initial point to a better place for optimization. Experiment verifies our claims.
Researcher Affiliation Academia Yuanzhi Li Computer Science Department Princeton University yuanzhil@cs.princeton.edu Yang Yuan Computer Science Department Cornell University yangyuan@cs.cornell.edu
Pseudocode No No structured pseudocode or algorithm blocks were found in the paper.
Open Source Code Yes Our code can be found in the supplementary materials.
Open Datasets Yes In this experiment, we choose Cifar-10 as the dataset, and all the networks have 56-layers.
Dataset Splits No The paper mentions 'training set of size 100,000, and test set of size 10,000' in Section 5.2 but does not explicitly detail a validation split or its size.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) were mentioned for running experiments.
Software Dependencies No No specific software dependencies with version numbers (e.g., 'Python 3.8', 'PyTorch 1.9') were found in the paper.
Experiment Setup Yes We use batch size 200, step size 0.001. We run Res Link for 5 times with random initialization ( W 2 0.6 and W F 5), and plot the curves by taking the average.