High Probability Convergence of Stochastic Gradient Methods

Authors: Zijian Liu, Ta Duy Nguyen, Thien Hang Nguyen, Alina Ene, Huy Nguyen

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this work, we describe a generic approach to show convergence with high probability for both stochastic convex and non-convex optimization with sub-Gaussian noise. Instead, we show high probability convergence with bounds depending on the initial distance to the optimal solution. The method can be applied to the non-convex case. We demonstrate an O((1+σ2 log(1/δ))/T +σ/ T) convergence rate when the number of iterations T is known and an O((1 + σ2 log(T/δ))/ T) convergence rate when T is unknown for SGD, where 1 δ is the desired success probability.
Researcher Affiliation Academia 1Stern School of Business, New York University 2Department of Computer Science, Boston University 3Khoury College of Computer Sciences, Northeastern University. Correspondence to: Zijian Liu <zl3067@nyu.edu>, Ta Duy Nguyen <taduy@bu.edu>, Thien Hang Nguyen <nguyen.thien@northeastern.edu>.
Pseudocode Yes Algorithm 1 Stochastic Mirror Descent Algorithm Algorithm 2 Accelerated Stochastic Mirror Descent Algorithm (Lan, 2020). Algorithm 3 Stochastic Gradient Descent (SGD) Algorithm 4 Ada Grad-Norm
Open Source Code No The paper provides a link to the full version of the paper on arXiv (https://arxiv.org/abs/2302.14843), but this is not a link to open-source code for the methodology described in the paper. There is no explicit statement about releasing code.
Open Datasets No The paper is theoretical and does not use or refer to any datasets for empirical evaluation. Therefore, no information about publicly available or open datasets is provided.
Dataset Splits No The paper is theoretical and does not describe empirical experiments or dataset usage. Therefore, no information on training, validation, or test dataset splits is provided.
Hardware Specification No The paper is purely theoretical and does not involve empirical experiments, thus no hardware specifications are mentioned.
Software Dependencies No The paper is theoretical and does not specify any software dependencies with version numbers for implementation or experimentation.
Experiment Setup No The paper is theoretical and does not describe an experimental setup with specific hyperparameters or system-level training settings.