reproducibilityindex.ai

Stability and Convergence of Stochastic Gradient Clipping: Beyond Lipschitz Continuity and Smoothness

Authors: Vien V. Mai, Mikael Johansson

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Numerical results conﬁrm our theoretical developments. Our experiments on phase retrieval, absolute linear regression, and classiﬁcation with neural networks reafﬁrm our theoretical ﬁndings that gradient clipping can: i) stabilize and guarantee convergence for problems with rapidly growing gradients; ii) retain and sometimes improve the best performance of their unclipped counterparts even on standard problems.
Researcher Affiliation	Academia	1Division of Decision and Control Systems, EECS, KTH Royal Institute of Technology, Stockholm, Sweden.
Pseudocode	No	The algorithm is described using mathematical equations (4a) and (4b) but not presented in a formal pseudocode or algorithm block.
Open Source Code	No	The paper does not provide an explicit statement or link for the open-sourcing of the code for the methodology described.
Open Datasets	Yes	For our last set of experiments, we consider the image classiﬁcation task on the CIFAR10 dataset (Krizhevsky et al., 2009)
Dataset Splits	No	The paper mentions using the CIFAR10 dataset and mini-batch size but does not explicitly provide details about specific training, validation, or test dataset splits (e.g., percentages or sample counts).
Hardware Specification	No	The computations were enabled by resources provided by the Swedish National Infrastructure for Computing (SNIC), partially funded by the Swedish Research Council through grant agreement no. 2018-05973. This statement is too general and does not provide specific hardware models or specifications.
Software Dependencies	No	The paper mentions 'Py Torch' but does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	Following common practice, we use mini-batch size 128, momentum parameter β = 0.9, and weight-decay coefﬁcient 5 10 4 in all experiments. For the stepsizes, we use constant values starting with α0 and reduce them by a factor of 10 every 50 epochs.