High Probability Guarantees for Nonconvex Stochastic Gradient Descent with Heavy Tails
Authors: Shaojie Li, Yong Liu
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this paper, we develop high probability bounds for nonconvex SGD with a joint perspective of optimization and generalization performance. Instead of the light tail assumption, we consider the gradient noise following a heavy-tailed sub Weibull distribution, a novel class generalizing the sub-Gaussian and sub-Exponential families to potentially heavier-tailed distributions. Under these complicated settings, we first present high probability bounds with best-known rates in general nonconvex learning, then move to nonconvex learning with a gradient dominance curvature condition, for which we improve the learning guarantees to fast rates. |
| Researcher Affiliation | Academia | 1Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China 2Beijing Key Laboratory of Big Data Management and Analysis Methods, Beijing, China. Correspondence to: Yong Liu <liuyonggsai@ruc.edu.cn>. |
| Pseudocode | Yes | Algorithm 1 SGD Algorithm 2 SGD with Clippling |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code, nor does it include links to a code repository for the methodology described. |
| Open Datasets | No | This is a theoretical paper that does not involve empirical experiments or the use of datasets for training or evaluation. |
| Dataset Splits | No | This is a theoretical paper and does not discuss dataset splits for training, validation, or testing. |
| Hardware Specification | No | The paper focuses on theoretical analysis and does not describe any experimental hardware specifications. |
| Software Dependencies | No | The paper focuses on theoretical analysis and does not mention any software dependencies with specific version numbers for implementation or experimentation. |
| Experiment Setup | No | The paper is theoretical and does not include details on experimental setup, hyperparameters, or system-level training settings. |