Parameter Symmetry and Noise Equilibrium of Stochastic Gradient Descent
Authors: Liu Ziyin, Mingze Wang, Hongchao Li, Lei Wu
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We apply our theory to understand specific problems and present numerical results in Section 5. All the proofs are presented in the Appendix. |
| Researcher Affiliation | Collaboration | Liu Ziyin Massachusetts Institute of Technology, NTT Research ziyinl@mit.edu Mingze Wang Peking University mingzewang@stu.pku.edu.cn Hongchao Li The University of Tokyo lhc@cat.phys.s.u-tokyo.ac.jp Lei Wu Peking University leiwu@math.pku.edu.cn |
| Pseudocode | No | No pseudocode or algorithm blocks were found. |
| Open Source Code | No | Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [No] Justification: The code or data of the experiments are simple and easy to reproduce following the description in the main text. |
| Open Datasets | Yes | Here, we give the details for the experiment in Figure 2. We train a two-layer linear net with d0 = d2 = 30 and d = 40. The input data is x N(0,1), and y = x+ϵ, where ϵ is i.i.d. Gaussian with unit variance. |
| Dataset Splits | No | The paper mentions training and testing phases but does not explicitly provide details about training/validation/test dataset splits, such as percentages or sample counts for a validation set. |
| Hardware Specification | No | For each experiment, does the paper provide sufficient information on the computer resources (type of compute workers, memory, time of execution) needed to reproduce the experiments? Answer: [No] Justification: The experiments can be simply conducted on personal computers. |
| Software Dependencies | No | The paper does not explicitly list specific software dependencies with version numbers (e.g., Python version, library versions like PyTorch or TensorFlow). |
| Experiment Setup | Yes | Here, we give the details for the experiment in Figure 2. We train a two-layer linear net with d0 = d2 = 30 and d = 40. The input data is x N(0,1), and y = x+ϵ, where ϵ is i.i.d. Gaussian with unit variance. (Section A.2), when the learning rate (η = 0.008) is too large, SGD diverges (orange line). However, when one starts training at a small learning rate (0.001) and increases η to 0.008 after 5000 iterations, the training remains stable. (Figure 4 caption), Unless it is the independent variable, η, S and d are set to be 0.1, 100 and 2000, respectively. (Figure 8 caption). |