Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Efficiently Escaping Saddle Points under Generalized Smoothness via Self-Bounding Regularity

Authors: Daniel Cao, August Chen, Karthik Sridharan, Benjamin Tang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 3.7 Practical Implications and Simulations Our results show under generalizations of smoothness, unlike with Lipschitz gradient/Hessian, the larger the loss is at initialization (larger F(w0)) and larger self-bounding functions ρ1( ) shrink the window for choosing a working η. Specifically, with larger loss at initialization, the smaller the largest working step size is, in contrast to optimizing smooth functions. This implies in practice, for losses with non-Lipschitz gradient/Hessian, one should tune η based on suboptimality at initialization. In Section G, we validate this finding through simulations with GD and SGD on several natural smooth and generalized smooth functions, namely F(w) = Aw p for p = 2,3,4,5,6. Our simulations show the above theoretical conclusions match behavior in practice, validating the practical implications of our theoretical results on which step sizes successfully optimize generalized smooth functions.
Researcher Affiliation Academia Department of Computer Science, Cornell University EMAIL
Pseudocode Yes Perturbed GD: This algorithm, formally written in Algorithm 1, Section D, is as follows.
Open Source Code Yes Answer: [Yes] Justification: We provide the code in the supplementary material. The code has sufficient instructions to reproduce the main experimental results.
Open Datasets No G.1 Synthetic Simulations with GD Simulation Details: We consider F(w) = Aw p for p = 2,3,4,5,6, where A = diag( 1/2,1).
Dataset Splits No Initialization: For each step size ηi, we initialize GD at 4 distributions πj = N( 0,cj I 20) for cj {2.5,5,7.5,10}. For each of these 4 distributions πj, we draw 100 points w0 πj to use as our initialization.
Hardware Specification Yes The simulations for Subsection G.1 were run on a Jupyter notebook in Python in Google Colab Pro, connected to a single NVIDIA T4 GPU.
Software Dependencies No The simulations for Subsection G.1 were run on a Jupyter notebook in Python in Google Colab Pro, connected to a single NVIDIA T4 GPU.
Experiment Setup Yes For each p = 2,3,4,5,6, we consider the following settings for GD: Step sizes: We consider 30 step sizes {ηi}30 i=1,η1 < < η30 evenly spaced on a log scale between 10 8 and 101, inclusive. Initialization: For each step size ηi, we initialize GD at 4 distributions πj = N( 0,cj I 20) for cj {2.5,5,7.5,10}. For each of these 4 distributions πj, we draw 100 points w0 πj to use as our initialization. Number of steps: For each ηi and each w0 πj, we run GD initialized at w0 with step size ηi for T = 1000 iterations.