reproducibilityindex.ai

The Implicit Regularization of Dynamical Stability in Stochastic Gradient Descent

Authors: Lei Wu, Weijie J Su

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Additionally, numerical experiments are provided to support our theoretical ﬁndings.
Researcher Affiliation	Academia	1School of Mathematical Sciences, Peking University, Beijing, China 2Center for Machine Learning Research, Peking University, Beijing, China 3Wharton Statistics and Data Science Department, University of Pennsylvania, Philadelphia, USA.
Pseudocode	No	The paper does not contain any pseudocode or algorithm blocks. It focuses on theoretical analysis and numerical results presented in figures.
Open Source Code	No	The paper does not provide any explicit statements or links indicating that the source code for the methodology described is publicly available.
Open Datasets	No	The paper uses synthetic data generated according to specified distributions (e.g., “vi iid Unif(Sd 1)”). It does not use or provide access to a publicly available or open dataset.
Dataset Splits	No	The paper does not explicitly describe train/validation/test dataset splits. While it mentions “training set” in general terms, specific proportions or sample counts for splits are not provided.
Hardware Specification	No	The paper does not provide specific details about the hardware used to conduct the experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers (e.g., programming languages, libraries, frameworks) that would be needed to reproduce the experiments.
Experiment Setup	Yes	The gradient clipping is automatically switched off since around 4000 iterations. After that, SGD can stably converge to a global minimum without clipping operations. This implies that around the convergent minimum, linear stability should be satisﬁed and consequently, it is not surprising to observe that Tr(G(θt)) 2/η when θt nearly converge. Another interesting observation is that during the whole training process, Tr(G(θt)) keeps decreasing, which in turn causes the continued decreasing of path norm.