reproducibilityindex.ai

Understanding the Role of Momentum in Stochastic Gradient Methods

Authors: Igor Gitman, Hunter Lang, Pengchuan Zhang, Lin Xiao

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In addition, by combining the results on convergence rates and stationary distributions, we obtain sometimes counter-intuitive practical guidelines for setting the learning rate and momentum parameters. ... We evaluate the average ﬁnal loss for a large grid of parameters α, β and ν on three problems: a 2-dimensional quadratic function (where all of our assumptions are satisﬁed), logistic regression on the MNIST [16] dataset (where the quadratic assumption is approximately satisﬁed, but gradient noise comes from mini-batches) and Res Net-18 [10] on CIFAR-10 [13] (where all of our assumptions are likely violated). Figure 3 shows the results of this experiment.
Researcher Affiliation	Industry	Igor Gitman Hunter Lang Pengchuan Zhang Lin Xiao Microsoft Research AI Redmond, WA 98052, USA {igor.gitman, hunter.lang, penzhan, lin.xiao}@microsoft.com
Pseudocode	No	The paper describes the QHM algorithm with mathematical equations (6) and discusses its dynamics, but it does not present it in a pseudocode block or a clearly labeled algorithm section.
Open Source Code	Yes	The code of all of our experiments is available at https://github.com/Kipok/understanding-momentum.
Open Datasets	Yes	Next, we evaluate the average ﬁnal loss for a large grid of parameters α, β and ν on three problems: a 2-dimensional quadratic function (where all of our assumptions are satisﬁed), logistic regression on the MNIST [16] dataset (where the quadratic assumption is approximately satisﬁed, but gradient noise comes from mini-batches) and Res Net-18 [10] on CIFAR-10 [13] (where all of our assumptions are likely violated).
Dataset Splits	No	The paper mentions using MNIST and CIFAR-10 datasets, which typically have predefined splits. However, the paper does not explicitly state the training, validation, or test split percentages or sample counts within the text, nor does it cite a source for specific splits in the relevant sections.
Hardware Specification	No	The paper describes experiments and their outcomes but does not provide any specific details about the hardware used (e.g., GPU models, CPU types, or memory specifications).
Software Dependencies	No	The paper describes the algorithms and experiments but does not provide specific version numbers for any software dependencies, such as programming languages, libraries, or frameworks used (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	In Section 6, by combining our results in Sections 4 and 5, we obtain new and, in some cases, counter-intuitive insight into how to set these parameters in practice. ... Figure 2: Changes in the shape and size of stationary distribution changes with respect to α, β, and ν on a 2-dimensional quadratic problem. Each picture shows the last 5000 iterates of QHM on a contour plot. The ﬁrst picture of each row is a reference and other pictures should be compared to it. The second pictures show how the stationary distribution changes when we decrease α.