Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Strength of Minibatch Noise in SGD
Authors: Liu Ziyin, Kangqiao Liu, Takashi Mori, Masahito Ueda
ICLR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This work presents the first systematic study of the SGD noise and fluctuations close to a local minimum. We first analyze the SGD noise in linear regression in detail and then derive a general formula for approximating SGD noise in different types of minima. For reference, the relationship of this work to the previous works is shown in Table 1. A EXPERIMENTS: We run 1d experiment in Figure 4(a) and high dimensional experiments in Figures 4(b)-(c), where we choose D = 2 for visualization. |
| Researcher Affiliation | Academia | Liu Ziyin , Kangqiao Liu , Takashi Mori, & Masahito Ueda The University of Tokyo |
| Pseudocode | No | The paper provides mathematical derivations and discusses algorithms (SGD), but it does not include any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any statements about releasing code or links to source code repositories. |
| Open Datasets | Yes | We train a two-layer tanh neural network on MNIST and plot the variance of its training loss in the first epoch with fixed λ = 0.5. We train a logistic regressor on the MNIST dataset with a large learning rate (of order O(1)). |
| Dataset Splits | No | The paper mentions using the MNIST dataset and training models but does not specify the train/validation/test split percentages or sample counts for reproduction. While MNIST has a standard split, it is not explicitly mentioned here. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies or libraries used in the experiments. |
| Experiment Setup | Yes | We train a two-layer tanh neural network on MNIST and plot the variance of its training loss in the first epoch with fixed λ = 0.5. We train a logistic regressor on the MNIST dataset with a large learning rate (of order O(1)). In Figure 3-Left, we run a 1d experiment with λ = 1, N = 10000 and σ2 = 0.25. In Figure 3-Right, we plot a standard case where the optimal regularization strength γ is vanishing. The parameters are set to be a = 1, λ = 0.5, S = 1. |