The Implicit Regularization of Dynamical Stability in Stochastic Gradient Descent
Authors: Lei Wu, Weijie J Su
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Additionally, numerical experiments are provided to support our theoretical findings. |
| Researcher Affiliation | Academia | 1School of Mathematical Sciences, Peking University, Beijing, China 2Center for Machine Learning Research, Peking University, Beijing, China 3Wharton Statistics and Data Science Department, University of Pennsylvania, Philadelphia, USA. |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. It focuses on theoretical analysis and numerical results presented in figures. |
| Open Source Code | No | The paper does not provide any explicit statements or links indicating that the source code for the methodology described is publicly available. |
| Open Datasets | No | The paper uses synthetic data generated according to specified distributions (e.g., “vi iid Unif(Sd 1)”). It does not use or provide access to a publicly available or open dataset. |
| Dataset Splits | No | The paper does not explicitly describe train/validation/test dataset splits. While it mentions “training set” in general terms, specific proportions or sample counts for splits are not provided. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to conduct the experiments, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., programming languages, libraries, frameworks) that would be needed to reproduce the experiments. |
| Experiment Setup | Yes | The gradient clipping is automatically switched off since around 4000 iterations. After that, SGD can stably converge to a global minimum without clipping operations. This implies that around the convergent minimum, linear stability should be satisfied and consequently, it is not surprising to observe that Tr(G(θt)) 2/η when θt nearly converge. Another interesting observation is that during the whole training process, Tr(G(θt)) keeps decreasing, which in turn causes the continued decreasing of path norm. |