reproducibilityindex.ai

Power-Law Escape Rate of SGD

Authors: Takashi Mori, Liu Ziyin, Kangqiao Liu, Masahito Ueda

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In Section 5.1, we experimentally verify the decoupling approximation for the entire training dynamics. In Section 5.2, we measure the SGD noise strength and conﬁrm that it is indeed proportional to the loss function near a minimum. In Section 5.3, we experimentally test the validity of Eq. (15) for the escape rate.
Researcher Affiliation	Academia	1Center for Emergent Matter Science, Riken, Saitama, Japan 2Department of Physics, The University of Tokyo, Tokyo, Japan 3Institute for Physics of Intelligence, The University of Tokyo, Tokyo, Japan.
Pseudocode	No	The paper does not contain any pseudocode or algorithm blocks. Procedures are described through mathematical derivations and textual explanations.
Open Source Code	No	The paper does not provide any statement or link regarding the availability of open-source code for the described methodology.
Open Datasets	Yes	We consider a binary classiﬁcation problem using the ﬁrst 104 samples of the MNIST dataset... First, we consider training of the Fashion-MNIST dataset... Second, we consider training of the CIFAR-10 dataset...
Dataset Splits	No	The paper mentions training data, but does not explicitly provide training/validation/test dataset splits needed to reproduce the experiment.
Hardware Specification	No	The paper does not explicitly describe the hardware used to run its experiments, lacking details such as specific GPU models, CPU types, or cloud resources with specs.
Software Dependencies	No	The paper does not provide specific version numbers for any software components, such as programming languages, libraries, or frameworks used in the experiments.
Experiment Setup	Yes	We ﬁx η = 0.01 and B = 100. Starting from the Glorot initialization, the network is trained by SGD of the mini-batch size B = 100 and η = 0.1 for the mean-square loss. We ﬁx B = 100 in both cases, and η = 0.1 for the fully connected network and η = 0.05 for the convolutional network.