reproducibilityindex.ai

Parameter Symmetry and Noise Equilibrium of Stochastic Gradient Descent

Authors: Liu Ziyin, Mingze Wang, Hongchao Li, Lei Wu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We apply our theory to understand specific problems and present numerical results in Section 5. All the proofs are presented in the Appendix.
Researcher Affiliation	Collaboration	Liu Ziyin Massachusetts Institute of Technology, NTT Research ziyinl@mit.edu Mingze Wang Peking University mingzewang@stu.pku.edu.cn Hongchao Li The University of Tokyo lhc@cat.phys.s.u-tokyo.ac.jp Lei Wu Peking University leiwu@math.pku.edu.cn
Pseudocode	No	No pseudocode or algorithm blocks were found.
Open Source Code	No	Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [No] Justification: The code or data of the experiments are simple and easy to reproduce following the description in the main text.
Open Datasets	Yes	Here, we give the details for the experiment in Figure 2. We train a two-layer linear net with d0 = d2 = 30 and d = 40. The input data is x N(0,1), and y = x+ϵ, where ϵ is i.i.d. Gaussian with unit variance.
Dataset Splits	No	The paper mentions training and testing phases but does not explicitly provide details about training/validation/test dataset splits, such as percentages or sample counts for a validation set.
Hardware Specification	No	For each experiment, does the paper provide sufficient information on the computer resources (type of compute workers, memory, time of execution) needed to reproduce the experiments? Answer: [No] Justification: The experiments can be simply conducted on personal computers.
Software Dependencies	No	The paper does not explicitly list specific software dependencies with version numbers (e.g., Python version, library versions like PyTorch or TensorFlow).
Experiment Setup	Yes	Here, we give the details for the experiment in Figure 2. We train a two-layer linear net with d0 = d2 = 30 and d = 40. The input data is x N(0,1), and y = x+ϵ, where ϵ is i.i.d. Gaussian with unit variance. (Section A.2), when the learning rate (η = 0.008) is too large, SGD diverges (orange line). However, when one starts training at a small learning rate (0.001) and increases η to 0.008 after 5000 iterations, the training remains stable. (Figure 4 caption), Unless it is the independent variable, η, S and d are set to be 0.1, 100 and 2000, respectively. (Figure 8 caption).