Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Fast Mixing of Stochastic Gradient Descent with Normalization and Weight Decay
Authors: Zhiyuan Li, Tianhao Wang, Dingli Yu
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Though our convergence result is asymptotic, we verify in simplified settings that the phenomena predicted by our theory happens with LR and WD factor λ of practical scale (see Section 6 for details of experiments). We also show empirically that the mixing process exists in practical settings, and is beneficial for generalization. |
| Researcher Affiliation | Academia | Zhiyuan Li Princeton University EMAIL Tianhao Wang Yale University EMAIL Dingli Yu Princeton University EMAIL |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | We include the code and datasets along with instructions needed to reproduce the main experimental results in the supplemental material. |
| Open Datasets | Yes | Beyond the toy example, we further study the limiting diffusion of Pre Res Net on CIFAR-10 [26]. (Reference [26]: Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images.) |
| Dataset Splits | No | Figure 1a shows the train and test accuracy of scale invariant Pre Res Net trained by SGD+WD on CIFAR-10 with standard data augmentation. The paper mentions training and testing, but no explicit validation split details are provided. |
| Hardware Specification | No | No specific hardware details (like GPU/CPU models or types of resources) are mentioned in the paper's text. The checklist for experimental details explicitly states 'No' for including compute resources. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers. |
| Experiment Setup | Yes | In our experiments, we choose D = 10, σ = 0.3, the WD factor λ = 0.05, and LR 2 {10 2, 10 3, 10 4}. (Section 6.1). We train a 32-layer Pre Res Net [27] with initial LR = 0.8 and WD factor λ = 5 10 4. (Section 6.2) |