Beyond the Edge of Stability via Two-step Gradient Updates

Authors: Lei Chen, Joan Bruna

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical verification of all theorems are provided in Appendix B. We conduct an experiment on real data to show that our finding in the low-dimension setting in Theorem 1 is possible to generalize to high-dimensional setting.
Researcher Affiliation Academia 1Courant Institute of Mathematical Sciences, New York University, New York 2Center for Data Science, New York University, New York.
Pseudocode No The paper does not contain any explicitly labeled "Pseudocode" or "Algorithm" blocks.
Open Source Code No The paper does not include any statements about releasing code or provide links to a code repository.
Open Datasets Yes We run 3, 4, 5-layer Re LU MLPs on MNIST (Le Cun et al., 1998).
Dataset Splits No The paper does not explicitly provide details about training/validation/test dataset splits, only mentioning training on MNIST and a synthetic dataset.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., CPU, GPU models) used to run the experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers.
Experiment Setup Yes We run gradient descent with two learning rates η1 = 0.5, η2 = 2.6. ... We train such a model... with learning rate η = 2.2 = 1.1d... The learning rate is 1.02 Eo S threshold. ... with learning rate η = 1.05 and η = 1.25. ... with learning rates η = 0.5, 0.4, 0.35 and a small rate η = 0.1 (for 3-layer).