reproducibilityindex.ai

Strength of Minibatch Noise in SGD

Authors: Liu Ziyin, Kangqiao Liu, Takashi Mori, Masahito Ueda

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This work presents the ﬁrst systematic study of the SGD noise and ﬂuctuations close to a local minimum. We ﬁrst analyze the SGD noise in linear regression in detail and then derive a general formula for approximating SGD noise in different types of minima. For reference, the relationship of this work to the previous works is shown in Table 1. A EXPERIMENTS: We run 1d experiment in Figure 4(a) and high dimensional experiments in Figures 4(b)-(c), where we choose D = 2 for visualization.
Researcher Affiliation	Academia	Liu Ziyin , Kangqiao Liu , Takashi Mori, & Masahito Ueda The University of Tokyo
Pseudocode	No	The paper provides mathematical derivations and discusses algorithms (SGD), but it does not include any pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any statements about releasing code or links to source code repositories.
Open Datasets	Yes	We train a two-layer tanh neural network on MNIST and plot the variance of its training loss in the ﬁrst epoch with ﬁxed λ = 0.5. We train a logistic regressor on the MNIST dataset with a large learning rate (of order O(1)).
Dataset Splits	No	The paper mentions using the MNIST dataset and training models but does not specify the train/validation/test split percentages or sample counts for reproduction. While MNIST has a standard split, it is not explicitly mentioned here.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments.
Software Dependencies	No	The paper does not provide specific version numbers for any software dependencies or libraries used in the experiments.
Experiment Setup	Yes	We train a two-layer tanh neural network on MNIST and plot the variance of its training loss in the ﬁrst epoch with ﬁxed λ = 0.5. We train a logistic regressor on the MNIST dataset with a large learning rate (of order O(1)). In Figure 3-Left, we run a 1d experiment with λ = 1, N = 10000 and σ2 = 0.25. In Figure 3-Right, we plot a standard case where the optimal regularization strength γ is vanishing. The parameters are set to be a = 1, λ = 0.5, S = 1.