Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

On the Generalization of Stochastic Gradient Descent with Momentum

Authors: Ali Ramezani-Kebrya, Kimon Antonakopoulos, Volkan Cevher, Ashish Khisti, Ben Liang

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we validate the insights obtained in our theoretical results using experimental evaluation. Our main goal is to study how adding momentum aﬀects the generalization and convergence of SGD. We ﬁrst investigate the performance of SGDEM when applied to both CIFAR10 (Krizhevsky) and not MINIST datasets for nonconvex loss functions.
Researcher Affiliation	Academia	Ali Ramezani-Kebrya EMAIL Department of Informatics, University of Oslo and Visual Intelligence Centre Integreat, Norwegian Centre for Knowledge-driven Machine Learning Gaustadalléen 23B, Ole-Johan Dahls hus, 0373 Oslo, Norway Kimon Antonakopoulos EMAIL Laboratory for Information and Inference Systems (LIONS), EPFL EPFL STI IEL LIONS, Station 11, CH-1015 Lausanne, Switzerland Volkan Cevher EMAIL Laboratory for Information and Inference Systems (LIONS), EPFL EPFL STI IEL LIONS, Station 11, CH-1015 Lausanne, Switzerland Ashish Khisti EMAIL Department of Electrical and Computer Engineering, University of Toronto 40 St. George Street, Toronto, ON M5S 2E4, Canada Ben Liang EMAIL Department of Electrical and Computer Engineering, University of Toronto 40 St. George Street, Toronto, ON M5S 2E4, Canada
Pseudocode	No	The paper describes algorithms like SGDM and SGDEM using mathematical update rules (e.g., "wt+1 = wt + µ(wt wt 1) αt wℓ(wt; zit) (SGDM)") rather than explicit pseudocode blocks or algorithm listings.
Open Source Code	No	The paper does not provide any explicit statement about releasing source code, a link to a code repository, or mention of code in supplementary materials.
Open Datasets	Yes	We ﬁrst investigate the performance of SGDEM when applied to both CIFAR10 (Krizhevsky) and not MINIST datasets for nonconvex loss functions. Fig. 1: Validation loss and generalization error of SGDEM when training Res Net-18 (He et al., 2016) on Image Net (Deng et al., 2009) in a distributed setting with 4 GPUs under tuned step-size and global minibatch size of 128.
Dataset Splits	Yes	Fig. 1: Validation loss and generalization error of SGDEM when training Res Net-18 (He et al., 2016) on Image Net (Deng et al., 2009) in a distributed setting with 4 GPUs under tuned step-size and global minibatch size of 128. For each td, the momentum is set to µd = 0.9 in the ﬁrst td epochs and then zero for the next 90 td epochs. SGDM is a special form of SGDEM with td = 90. The details are provided in Section 5 and Appendix L . Fig. 3: Validation accuracy and generalization gap of SGDEM when training Res Net-18 on Image Net in a distributed setting with 4 GPUs under tuned step-size and global minibatch size of 128.
Hardware Specification	Yes	Details of Image Net experiments. The global minibatch size and weight decay are set to 128 and 5 10 5, respectively. For each td, the momentum is set to µd = 0.9 in the ﬁrst td epochs and then zero for the next 90 td epochs. We use a cluster with 4 NVIDIA 2080 Ti GPUs with the following CPU details: Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz; 48 cores; GPU2GPU bandwidth: unidirectional 10GB/s and bidirectional 15GB/s.
Software Dependencies	No	The paper mentions various algorithms and models like SGD, SGDM, SGDEM, Res Net-18, Res Net-20, and Adam, but it does not specify any software dependencies (e.g., libraries, frameworks) with version numbers that would be required for replication.
Experiment Setup	Yes	We set T to 50000 and 14000 for CIFAR10 and not MNIST experiments, respectively. For each value of µd, we add momentum for 0-10 epochs. For each pair of (µd, td), we repeat the experiments 10 times with random initializations. SGDM can be viewed as a special form of SGDEM when the momentum is added for the entire training (i.e., td = T). For 10 epochs and without data augmentation, we train Res Net-20 on CIFAR10 and a feedforward fully connected neural network with 1000 hidden nodes on not MINIST. ... We set the step-size α = 0.01. The minibatch size is set to 10. We use 10 (SGDEM) realizations to evaluate the average performance.