Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Efficiently Escaping Saddle Points under Generalized Smoothness via Self-Bounding Regularity
Authors: Daniel Cao, August Chen, Karthik Sridharan, Benjamin Tang
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 3.7 Practical Implications and Simulations Our results show under generalizations of smoothness, unlike with Lipschitz gradient/Hessian, the larger the loss is at initialization (larger F(w0)) and larger self-bounding functions ρ1( ) shrink the window for choosing a working η. Specifically, with larger loss at initialization, the smaller the largest working step size is, in contrast to optimizing smooth functions. This implies in practice, for losses with non-Lipschitz gradient/Hessian, one should tune η based on suboptimality at initialization. In Section G, we validate this finding through simulations with GD and SGD on several natural smooth and generalized smooth functions, namely F(w) = Aw p for p = 2,3,4,5,6. Our simulations show the above theoretical conclusions match behavior in practice, validating the practical implications of our theoretical results on which step sizes successfully optimize generalized smooth functions. |
| Researcher Affiliation | Academia | Department of Computer Science, Cornell University EMAIL |
| Pseudocode | Yes | Perturbed GD: This algorithm, formally written in Algorithm 1, Section D, is as follows. |
| Open Source Code | Yes | Answer: [Yes] Justification: We provide the code in the supplementary material. The code has sufficient instructions to reproduce the main experimental results. |
| Open Datasets | No | G.1 Synthetic Simulations with GD Simulation Details: We consider F(w) = Aw p for p = 2,3,4,5,6, where A = diag( 1/2,1). |
| Dataset Splits | No | Initialization: For each step size ηi, we initialize GD at 4 distributions πj = N( 0,cj I 20) for cj {2.5,5,7.5,10}. For each of these 4 distributions πj, we draw 100 points w0 πj to use as our initialization. |
| Hardware Specification | Yes | The simulations for Subsection G.1 were run on a Jupyter notebook in Python in Google Colab Pro, connected to a single NVIDIA T4 GPU. |
| Software Dependencies | No | The simulations for Subsection G.1 were run on a Jupyter notebook in Python in Google Colab Pro, connected to a single NVIDIA T4 GPU. |
| Experiment Setup | Yes | For each p = 2,3,4,5,6, we consider the following settings for GD: Step sizes: We consider 30 step sizes {ηi}30 i=1,η1 < < η30 evenly spaced on a log scale between 10 8 and 101, inclusive. Initialization: For each step size ηi, we initialize GD at 4 distributions πj = N( 0,cj I 20) for cj {2.5,5,7.5,10}. For each of these 4 distributions πj, we draw 100 points w0 πj to use as our initialization. Number of steps: For each ηi and each w0 πj, we run GD initialized at w0 with step size ηi for T = 1000 iterations. |