Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Convergence of Clipped SGD on Convex $(L_0,L_1)$-Smooth Functions

Authors: Ofir Gaash, Kfir Y. Levy, Yair Carmon

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform empirical experiments to examine our theory and algorithmic choices. 4 Experiments
Researcher Affiliation Academia Tel Aviv University, EMAIL and EMAIL. Technion, EMAIL.
Pseudocode Yes Algorithm 1: Clipped SGD With Double Sampling
Open Source Code Yes The code for reproducing the experiments is available at github.com/formll/clipped-sgd-under-generalized-smoothness.
Open Datasets Yes Our experiments use the California Housing dataset [28] and the Parkinsons Telemonitoring dataset [36], which are published under CC0 and CC-BY 4.0 licenses, respectively.
Dataset Splits No There is no test set since measuring generalization is irrelevant in this paper.
Hardware Specification Yes All experiments provided in this paper were run on Google Colab (with a free account) using an NVIDIA T4 GPU.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch/TensorFlow versions, or other libraries).
Experiment Setup Yes We determine the clipping threshold c of each method by tuning it, avoiding reliance on theoretical quantities from the definitions in Table 1. Similarly, we modify the parameter ηt by replacing theoretical quantities with some tunable variable, which we denote as lr. For methods with a fixed stepsize, we simply set η = lr. For methods based on Adaptive SGD we set ηt = lr (Pt i=0 α2 i gi 2) 1/2, and for implicit clipping we set ηt = lr c/(c + gc t ) (see Section 3.1 on Zhang et al. [44] for intuition). We tune lr and c by performing a two-level, two-dimensional grid search. In the first-level grid, the values are geometrically spaced by a factor of 10: The values for c are (102, . . . , 107). The values for lr are (10 10, . . . , 10 5) for SGD, (10 7, . . . , 10 2) for clipped SGD, and (10 3, . . . , 102) for both Adaptive SGD and clipped Adaptive SGD. We verify that the best candidate is never at the edge of the grid. Denoting the best candidate as (lr1, c1), the second-level grid is defined as (lr, c) | lr ( 1 2lr1, lr1, 2lr1, 4lr1), c ( 1 2c1, c1, 2c1, 4c1) .