reproducibilityindex.ai

A Convergence Theory for Deep Learning via Over-Parameterization

Authors: Zeyuan Allen-Zhu, Yuanzhi Li, Zhao Song

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Figure 1: Landscapes of the CIFAR10 image-classiﬁcation training objective F (W ) near points W = Wt on the SGD training trajectory. The x and y axes represent the gradient direction F (Wt) and the most negatively curved direction of the Hessian after smoothing (approximately found by Oja s method (Allen-Zhu & Li, 2017; 2018)). The z axis represents the objective value. Observation. As far as minimizing objective is concerned, the (negative) gradient direction sufﬁciently decreases the training objective. This is consistent with our main ﬁndings Theorem 3 and 4. Using second-order information gives little help. Remark 2. The task is CIFAR10 (for CIFAR100 or CIFAR10 with noisy label, see Figure 2 through 7 in appendix). Remark 4. The six plots correspond to epoch 5, 40, 90, 120, 130 and 160. We start with learning rate 0.1, and decrease it to 0.01 at epoch 81, and to 0.001 at epoch 122. SGD with momentum 0.9 is used. The training code is unchanged from (Yang, 2018) and we only write new code for plotting such landscapes.
Researcher Affiliation	Collaboration	1Microsoft Research AI 2Stanford University 3Princeton University 4UT-Austin 5University Washington 6Harvard University.
Pseudocode	No	The paper does not contain any sections or figures explicitly labeled as 'Pseudocode' or 'Algorithm'.
Open Source Code	No	The paper refers to an external source 'The training code is unchanged from (Yang, 2018)' but does not provide a link to its own open-source code for the methodology described.
Open Datasets	Yes	Remark 2. The task is CIFAR10 (for CIFAR100 or CIFAR10 with noisy label, see Figure 2 through 7 in appendix).
Dataset Splits	No	The paper does not provide specific details on training, validation, and test dataset splits such as percentages or sample counts.
Hardware Specification	No	The paper does not provide specific details about the hardware used for the experiments, such as GPU or CPU models.
Software Dependencies	No	The paper mentions 'Py Torch' in Figure 1's caption, but it does not specify the version number of PyTorch or any other software dependencies.
Experiment Setup	Yes	We start with learning rate 0.1, and decrease it to 0.01 at epoch 81, and to 0.001 at epoch 122. SGD with momentum 0.9 is used. The training code is unchanged from (Yang, 2018) and we only write new code for plotting such landscapes.