High Probability Guarantees for Nonconvex Stochastic Gradient Descent with Heavy Tails

Authors: Shaojie Li, Yong Liu

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this paper, we develop high probability bounds for nonconvex SGD with a joint perspective of optimization and generalization performance. Instead of the light tail assumption, we consider the gradient noise following a heavy-tailed sub Weibull distribution, a novel class generalizing the sub-Gaussian and sub-Exponential families to potentially heavier-tailed distributions. Under these complicated settings, we first present high probability bounds with best-known rates in general nonconvex learning, then move to nonconvex learning with a gradient dominance curvature condition, for which we improve the learning guarantees to fast rates.
Researcher Affiliation Academia 1Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China 2Beijing Key Laboratory of Big Data Management and Analysis Methods, Beijing, China. Correspondence to: Yong Liu <liuyonggsai@ruc.edu.cn>.
Pseudocode Yes Algorithm 1 SGD Algorithm 2 SGD with Clippling
Open Source Code No The paper does not provide any explicit statements about releasing source code, nor does it include links to a code repository for the methodology described.
Open Datasets No This is a theoretical paper that does not involve empirical experiments or the use of datasets for training or evaluation.
Dataset Splits No This is a theoretical paper and does not discuss dataset splits for training, validation, or testing.
Hardware Specification No The paper focuses on theoretical analysis and does not describe any experimental hardware specifications.
Software Dependencies No The paper focuses on theoretical analysis and does not mention any software dependencies with specific version numbers for implementation or experimentation.
Experiment Setup No The paper is theoretical and does not include details on experimental setup, hyperparameters, or system-level training settings.