Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Convergence of Clipped SGD on Convex $(L_0,L_1)$-Smooth Functions

Authors: Ofir Gaash, Kfir Y. Levy, Yair Carmon

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We perform empirical experiments to examine our theory and algorithmic choices. 4 Experiments
Researcher Affiliation	Academia	Tel Aviv University, EMAIL and EMAIL. Technion, EMAIL.
Pseudocode	Yes	Algorithm 1: Clipped SGD With Double Sampling
Open Source Code	Yes	The code for reproducing the experiments is available at github.com/formll/clipped-sgd-under-generalized-smoothness.
Open Datasets	Yes	Our experiments use the California Housing dataset [28] and the Parkinsons Telemonitoring dataset [36], which are published under CC0 and CC-BY 4.0 licenses, respectively.
Dataset Splits	No	There is no test set since measuring generalization is irrelevant in this paper.
Hardware Specification	Yes	All experiments provided in this paper were run on Google Colab (with a free account) using an NVIDIA T4 GPU.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch/TensorFlow versions, or other libraries).
Experiment Setup	Yes	We determine the clipping threshold c of each method by tuning it, avoiding reliance on theoretical quantities from the definitions in Table 1. Similarly, we modify the parameter ηt by replacing theoretical quantities with some tunable variable, which we denote as lr. For methods with a fixed stepsize, we simply set η = lr. For methods based on Adaptive SGD we set ηt = lr (Pt i=0 α2 i gi 2) 1/2, and for implicit clipping we set ηt = lr c/(c + gc t ) (see Section 3.1 on Zhang et al. [44] for intuition). We tune lr and c by performing a two-level, two-dimensional grid search. In the first-level grid, the values are geometrically spaced by a factor of 10: The values for c are (102, . . . , 107). The values for lr are (10 10, . . . , 10 5) for SGD, (10 7, . . . , 10 2) for clipped SGD, and (10 3, . . . , 102) for both Adaptive SGD and clipped Adaptive SGD. We verify that the best candidate is never at the edge of the grid. Denoting the best candidate as (lr1, c1), the second-level grid is defined as (lr, c) \| lr ( 1 2lr1, lr1, 2lr1, 4lr1), c ( 1 2c1, c1, 2c1, 4c1) .