Strength of Minibatch Noise in SGD
Authors: Liu Ziyin, Kangqiao Liu, Takashi Mori, Masahito Ueda
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This work presents the first systematic study of the SGD noise and fluctuations close to a local minimum. We first analyze the SGD noise in linear regression in detail and then derive a general formula for approximating SGD noise in different types of minima. For reference, the relationship of this work to the previous works is shown in Table 1. A EXPERIMENTS: We run 1d experiment in Figure 4(a) and high dimensional experiments in Figures 4(b)-(c), where we choose D = 2 for visualization. |
| Researcher Affiliation | Academia | Liu Ziyin , Kangqiao Liu , Takashi Mori, & Masahito Ueda The University of Tokyo |
| Pseudocode | No | The paper provides mathematical derivations and discusses algorithms (SGD), but it does not include any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any statements about releasing code or links to source code repositories. |
| Open Datasets | Yes | We train a two-layer tanh neural network on MNIST and plot the variance of its training loss in the first epoch with fixed λ = 0.5. We train a logistic regressor on the MNIST dataset with a large learning rate (of order O(1)). |
| Dataset Splits | No | The paper mentions using the MNIST dataset and training models but does not specify the train/validation/test split percentages or sample counts for reproduction. While MNIST has a standard split, it is not explicitly mentioned here. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies or libraries used in the experiments. |
| Experiment Setup | Yes | We train a two-layer tanh neural network on MNIST and plot the variance of its training loss in the first epoch with fixed λ = 0.5. We train a logistic regressor on the MNIST dataset with a large learning rate (of order O(1)). In Figure 3-Left, we run a 1d experiment with λ = 1, N = 10000 and σ2 = 0.25. In Figure 3-Right, we plot a standard case where the optimal regularization strength γ is vanishing. The parameters are set to be a = 1, λ = 0.5, S = 1. |