Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Convergence of Clipped SGD on Convex $(L_0,L_1)$-Smooth Functions
Authors: Ofir Gaash, Kfir Y. Levy, Yair Carmon
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform empirical experiments to examine our theory and algorithmic choices. 4 Experiments |
| Researcher Affiliation | Academia | Tel Aviv University, EMAIL and EMAIL. Technion, EMAIL. |
| Pseudocode | Yes | Algorithm 1: Clipped SGD With Double Sampling |
| Open Source Code | Yes | The code for reproducing the experiments is available at github.com/formll/clipped-sgd-under-generalized-smoothness. |
| Open Datasets | Yes | Our experiments use the California Housing dataset [28] and the Parkinsons Telemonitoring dataset [36], which are published under CC0 and CC-BY 4.0 licenses, respectively. |
| Dataset Splits | No | There is no test set since measuring generalization is irrelevant in this paper. |
| Hardware Specification | Yes | All experiments provided in this paper were run on Google Colab (with a free account) using an NVIDIA T4 GPU. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch/TensorFlow versions, or other libraries). |
| Experiment Setup | Yes | We determine the clipping threshold c of each method by tuning it, avoiding reliance on theoretical quantities from the definitions in Table 1. Similarly, we modify the parameter ηt by replacing theoretical quantities with some tunable variable, which we denote as lr. For methods with a fixed stepsize, we simply set η = lr. For methods based on Adaptive SGD we set ηt = lr (Pt i=0 α2 i gi 2) 1/2, and for implicit clipping we set ηt = lr c/(c + gc t ) (see Section 3.1 on Zhang et al. [44] for intuition). We tune lr and c by performing a two-level, two-dimensional grid search. In the first-level grid, the values are geometrically spaced by a factor of 10: The values for c are (102, . . . , 107). The values for lr are (10 10, . . . , 10 5) for SGD, (10 7, . . . , 10 2) for clipped SGD, and (10 3, . . . , 102) for both Adaptive SGD and clipped Adaptive SGD. We verify that the best candidate is never at the edge of the grid. Denoting the best candidate as (lr1, c1), the second-level grid is defined as (lr, c) | lr ( 1 2lr1, lr1, 2lr1, 4lr1), c ( 1 2c1, c1, 2c1, 4c1) . |