Convergence Rates of Stochastic Gradient Descent under Infinite Noise Variance
Authors: Hongjian Wang, Mert Gurbuzbalaban, Lingjiong Zhu, Umut Simsekli, Murat A. Erdogdu
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this paper, we provide convergence guarantees for SGD under a state-dependent and heavy-tailed noise with a potentially infinite variance, for a class of strongly convex objectives. In the case where the p-th moment of the noise exists for some p [1, 2), we first identify a condition on the Hessian, coined p-positive (semi-)definiteness , that leads to an interesting interpolation between the positive semi-definite cone (p = 2) and the cone of diagonally dominant matrices with non-negative diagonal entries (p = 1). Under this condition, we provide a convergence rate for the distance to the global optimum in Lp. Furthermore, we provide a generalized central limit theorem, which shows that the properly scaled Polyak-Ruppert averaging converges weakly to a multivariate α-stable random vector. |
| Researcher Affiliation | Academia | Hongjian Wang Carnegie Mellon University hjnwang@cmu.edu Mert Gürbüzbalaban Rutgers University mg1366@rutgers.edu Lingjiong Zhu Florida State University zhu@math.fsu.edu Umut Sim sekli INRIA & ENS PSL Research University umut.simsekli@inria.fr Murat A. Erdogdu University of Toronto & Vector Institute erdogdu@cs.toronto.edu |
| Pseudocode | No | The paper describes the SGD algorithm using mathematical notation (Equation 1.2) but does not include any structured pseudocode block or algorithm listing. |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating that source code for the described methodology is publicly available. |
| Open Datasets | No | The paper discusses theoretical applications to 'Ordinary Least Squares' and 'Generalized Linear Models' using concepts like 'random vector z' and 'stream of i.i.d. drawn instances of the pair (y, z)', but it does not specify concrete, publicly available datasets or provide access information for any data used in an empirical sense. |
| Dataset Splits | No | The paper does not describe specific training, validation, or test dataset splits. Its focus is theoretical analysis and convergence guarantees rather than empirical evaluation requiring data partitioning details. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used for running experiments. The paper focuses on theoretical analysis rather than empirical experimentation requiring hardware specifications. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers. Its content is theoretical, and it does not describe an implementation that would require such details for reproducibility. |
| Experiment Setup | No | The paper does not provide specific details about an experimental setup, such as hyperparameter values, training configurations, or system-level settings. The paper is theoretical and focuses on mathematical properties of SGD. |