reproducibilityindex.ai

High Probability Guarantees for Nonconvex Stochastic Gradient Descent with Heavy Tails

Authors: Shaojie Li, Yong Liu

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	In this paper, we develop high probability bounds for nonconvex SGD with a joint perspective of optimization and generalization performance. Instead of the light tail assumption, we consider the gradient noise following a heavy-tailed sub Weibull distribution, a novel class generalizing the sub-Gaussian and sub-Exponential families to potentially heavier-tailed distributions. Under these complicated settings, we ﬁrst present high probability bounds with best-known rates in general nonconvex learning, then move to nonconvex learning with a gradient dominance curvature condition, for which we improve the learning guarantees to fast rates.
Researcher Affiliation	Academia	1Gaoling School of Artiﬁcial Intelligence, Renmin University of China, Beijing, China 2Beijing Key Laboratory of Big Data Management and Analysis Methods, Beijing, China. Correspondence to: Yong Liu <liuyonggsai@ruc.edu.cn>.
Pseudocode	Yes	Algorithm 1 SGD Algorithm 2 SGD with Clippling
Open Source Code	No	The paper does not provide any explicit statements about releasing source code, nor does it include links to a code repository for the methodology described.
Open Datasets	No	This is a theoretical paper that does not involve empirical experiments or the use of datasets for training or evaluation.
Dataset Splits	No	This is a theoretical paper and does not discuss dataset splits for training, validation, or testing.
Hardware Specification	No	The paper focuses on theoretical analysis and does not describe any experimental hardware specifications.
Software Dependencies	No	The paper focuses on theoretical analysis and does not mention any software dependencies with specific version numbers for implementation or experimentation.
Experiment Setup	No	The paper is theoretical and does not include details on experimental setup, hyperparameters, or system-level training settings.