reproducibilityindex.ai

Revisiting Gradient Clipping: Stochastic bias and tight convergence guarantees

Authors: Anastasia Koloskova, Hadrien Hendrikx, Sebastian U Stich

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	illustrate these results with experiments. In this section, we investigate the performance of gradient clipping on logistic regression on the w1a dataset (Platt, 1998), and on the artiﬁcial quadratic function f(x) = Eξ χ2(1) h f(x, ξ) := L 2 x 2 + x, ξ i , where x R100, we choose L = 0.1, and χ2(1) is a (coordinate-wise) chi-squared distribution with 1 degree of freedom. The goal is to highlight our theoretical results.
Researcher Affiliation	Academia	1EPFL, Switzerland 2Inria Grenoble, France (work done in part while at EPFL) 3CISPA Helmholtz Center for Information Security, Germany.
Pseudocode	No	The paper describes the clipped gradient descent algorithm using mathematical equations (e.g., 'xt+1 = xt ηgt , with gt = clipc( fξ(xt))'), but it does not present this as a formal pseudocode block or algorithm listing.
Open Source Code	No	The paper does not include any explicit statement about releasing source code or provide a link to a code repository for the methodology described.
Open Datasets	Yes	In this section, we investigate the performance of gradient clipping on logistic regression on the w1a dataset (Platt, 1998)
Dataset Splits	No	The paper mentions using the 'w1a dataset' and an 'artificial quadratic function' for experiments, but it does not specify any dataset splits (e.g., percentages or sample counts for training, validation, or testing).
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU or CPU models, memory specifications) used to run the experiments.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers (e.g., specific libraries, frameworks, or programming language versions) used for the experiments.
Experiment Setup	Yes	In Figures (a) and (b) we see that as soon as the clipping threshold is smaller or equal to the target gradient norm ϵ, the convergence speed is affected only by a constant. In Figure (c), we see that as the clipping threshold c decreases, the best tuned stepsize (tuned to reach ϵ = 10-2 fastest) decreases. Logistic regression on w1a dataset (batch sitze = 1).