Beyond the Edge of Stability via Two-step Gradient Updates
Authors: Lei Chen, Joan Bruna
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical verification of all theorems are provided in Appendix B. We conduct an experiment on real data to show that our finding in the low-dimension setting in Theorem 1 is possible to generalize to high-dimensional setting. |
| Researcher Affiliation | Academia | 1Courant Institute of Mathematical Sciences, New York University, New York 2Center for Data Science, New York University, New York. |
| Pseudocode | No | The paper does not contain any explicitly labeled "Pseudocode" or "Algorithm" blocks. |
| Open Source Code | No | The paper does not include any statements about releasing code or provide links to a code repository. |
| Open Datasets | Yes | We run 3, 4, 5-layer Re LU MLPs on MNIST (Le Cun et al., 1998). |
| Dataset Splits | No | The paper does not explicitly provide details about training/validation/test dataset splits, only mentioning training on MNIST and a synthetic dataset. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., CPU, GPU models) used to run the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers. |
| Experiment Setup | Yes | We run gradient descent with two learning rates η1 = 0.5, η2 = 2.6. ... We train such a model... with learning rate η = 2.2 = 1.1d... The learning rate is 1.02 Eo S threshold. ... with learning rate η = 1.05 and η = 1.25. ... with learning rates η = 0.5, 0.4, 0.35 and a small rate η = 0.1 (for 3-layer). |