reproducibilityindex.ai

Fast Mixing of Stochastic Gradient Descent with Normalization and Weight Decay

Authors: Zhiyuan Li, Tianhao Wang, Dingli Yu

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Though our convergence result is asymptotic, we verify in simpliﬁed settings that the phenomena predicted by our theory happens with LR and WD factor λ of practical scale (see Section 6 for details of experiments). We also show empirically that the mixing process exists in practical settings, and is beneﬁcial for generalization.
Researcher Affiliation	Academia	Zhiyuan Li Princeton University zhiyuanli@cs.princeton.edu Tianhao Wang Yale University tianhao.wang@yale.edu Dingli Yu Princeton University dingliy@cs.princeton.edu
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	We include the code and datasets along with instructions needed to reproduce the main experimental results in the supplemental material.
Open Datasets	Yes	Beyond the toy example, we further study the limiting diffusion of Pre Res Net on CIFAR-10 [26]. (Reference [26]: Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images.)
Dataset Splits	No	Figure 1a shows the train and test accuracy of scale invariant Pre Res Net trained by SGD+WD on CIFAR-10 with standard data augmentation. The paper mentions training and testing, but no explicit validation split details are provided.
Hardware Specification	No	No specific hardware details (like GPU/CPU models or types of resources) are mentioned in the paper's text. The checklist for experimental details explicitly states 'No' for including compute resources.
Software Dependencies	No	The paper does not list specific software dependencies with version numbers.
Experiment Setup	Yes	In our experiments, we choose D = 10, σ = 0.3, the WD factor λ = 0.05, and LR 2 {10 2, 10 3, 10 4}. (Section 6.1). We train a 32-layer Pre Res Net [27] with initial LR = 0.8 and WD factor λ = 5 10 4. (Section 6.2)