reproducibilityindex.ai

Convergence Rates of Stochastic Gradient Descent under Infinite Noise Variance

Authors: Hongjian Wang, Mert Gurbuzbalaban, Lingjiong Zhu, Umut Simsekli, Murat A. Erdogdu

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	In this paper, we provide convergence guarantees for SGD under a state-dependent and heavy-tailed noise with a potentially inﬁnite variance, for a class of strongly convex objectives. In the case where the p-th moment of the noise exists for some p [1, 2), we ﬁrst identify a condition on the Hessian, coined p-positive (semi-)deﬁniteness , that leads to an interesting interpolation between the positive semi-deﬁnite cone (p = 2) and the cone of diagonally dominant matrices with non-negative diagonal entries (p = 1). Under this condition, we provide a convergence rate for the distance to the global optimum in Lp. Furthermore, we provide a generalized central limit theorem, which shows that the properly scaled Polyak-Ruppert averaging converges weakly to a multivariate α-stable random vector.
Researcher Affiliation	Academia	Hongjian Wang Carnegie Mellon University hjnwang@cmu.edu Mert Gürbüzbalaban Rutgers University mg1366@rutgers.edu Lingjiong Zhu Florida State University zhu@math.fsu.edu Umut Sim sekli INRIA & ENS PSL Research University umut.simsekli@inria.fr Murat A. Erdogdu University of Toronto & Vector Institute erdogdu@cs.toronto.edu
Pseudocode	No	The paper describes the SGD algorithm using mathematical notation (Equation 1.2) but does not include any structured pseudocode block or algorithm listing.
Open Source Code	No	The paper does not contain any explicit statements or links indicating that source code for the described methodology is publicly available.
Open Datasets	No	The paper discusses theoretical applications to 'Ordinary Least Squares' and 'Generalized Linear Models' using concepts like 'random vector z' and 'stream of i.i.d. drawn instances of the pair (y, z)', but it does not specify concrete, publicly available datasets or provide access information for any data used in an empirical sense.
Dataset Splits	No	The paper does not describe specific training, validation, or test dataset splits. Its focus is theoretical analysis and convergence guarantees rather than empirical evaluation requiring data partitioning details.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used for running experiments. The paper focuses on theoretical analysis rather than empirical experimentation requiring hardware specifications.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers. Its content is theoretical, and it does not describe an implementation that would require such details for reproducibility.
Experiment Setup	No	The paper does not provide specific details about an experimental setup, such as hyperparameter values, training configurations, or system-level settings. The paper is theoretical and focuses on mathematical properties of SGD.