Stability and Convergence of Stochastic Gradient Clipping: Beyond Lipschitz Continuity and Smoothness
Authors: Vien V. Mai, Mikael Johansson
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical results confirm our theoretical developments. Our experiments on phase retrieval, absolute linear regression, and classification with neural networks reaffirm our theoretical findings that gradient clipping can: i) stabilize and guarantee convergence for problems with rapidly growing gradients; ii) retain and sometimes improve the best performance of their unclipped counterparts even on standard problems. |
| Researcher Affiliation | Academia | 1Division of Decision and Control Systems, EECS, KTH Royal Institute of Technology, Stockholm, Sweden. |
| Pseudocode | No | The algorithm is described using mathematical equations (4a) and (4b) but not presented in a formal pseudocode or algorithm block. |
| Open Source Code | No | The paper does not provide an explicit statement or link for the open-sourcing of the code for the methodology described. |
| Open Datasets | Yes | For our last set of experiments, we consider the image classification task on the CIFAR10 dataset (Krizhevsky et al., 2009) |
| Dataset Splits | No | The paper mentions using the CIFAR10 dataset and mini-batch size but does not explicitly provide details about specific training, validation, or test dataset splits (e.g., percentages or sample counts). |
| Hardware Specification | No | The computations were enabled by resources provided by the Swedish National Infrastructure for Computing (SNIC), partially funded by the Swedish Research Council through grant agreement no. 2018-05973. This statement is too general and does not provide specific hardware models or specifications. |
| Software Dependencies | No | The paper mentions 'Py Torch' but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | Following common practice, we use mini-batch size 128, momentum parameter β = 0.9, and weight-decay coefficient 5 10 4 in all experiments. For the stepsizes, we use constant values starting with α0 and reduce them by a factor of 10 every 50 epochs. |