Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
High Probability Guarantees for Nonconvex Stochastic Gradient Descent with Heavy Tails
Authors: Shaojie Li, Yong Liu
ICML 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this paper, we develop high probability bounds for nonconvex SGD with a joint perspective of optimization and generalization performance. Instead of the light tail assumption, we consider the gradient noise following a heavy-tailed sub Weibull distribution, a novel class generalizing the sub-Gaussian and sub-Exponential families to potentially heavier-tailed distributions. Under these complicated settings, we first present high probability bounds with best-known rates in general nonconvex learning, then move to nonconvex learning with a gradient dominance curvature condition, for which we improve the learning guarantees to fast rates. |
| Researcher Affiliation | Academia | 1Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China 2Beijing Key Laboratory of Big Data Management and Analysis Methods, Beijing, China. Correspondence to: Yong Liu <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 SGD Algorithm 2 SGD with Clippling |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code, nor does it include links to a code repository for the methodology described. |
| Open Datasets | No | This is a theoretical paper that does not involve empirical experiments or the use of datasets for training or evaluation. |
| Dataset Splits | No | This is a theoretical paper and does not discuss dataset splits for training, validation, or testing. |
| Hardware Specification | No | The paper focuses on theoretical analysis and does not describe any experimental hardware specifications. |
| Software Dependencies | No | The paper focuses on theoretical analysis and does not mention any software dependencies with specific version numbers for implementation or experimentation. |
| Experiment Setup | No | The paper is theoretical and does not include details on experimental setup, hyperparameters, or system-level training settings. |