High Probability Convergence of Stochastic Gradient Methods
Authors: Zijian Liu, Ta Duy Nguyen, Thien Hang Nguyen, Alina Ene, Huy Nguyen
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this work, we describe a generic approach to show convergence with high probability for both stochastic convex and non-convex optimization with sub-Gaussian noise. Instead, we show high probability convergence with bounds depending on the initial distance to the optimal solution. The method can be applied to the non-convex case. We demonstrate an O((1+σ2 log(1/δ))/T +σ/ T) convergence rate when the number of iterations T is known and an O((1 + σ2 log(T/δ))/ T) convergence rate when T is unknown for SGD, where 1 δ is the desired success probability. |
| Researcher Affiliation | Academia | 1Stern School of Business, New York University 2Department of Computer Science, Boston University 3Khoury College of Computer Sciences, Northeastern University. Correspondence to: Zijian Liu <zl3067@nyu.edu>, Ta Duy Nguyen <taduy@bu.edu>, Thien Hang Nguyen <nguyen.thien@northeastern.edu>. |
| Pseudocode | Yes | Algorithm 1 Stochastic Mirror Descent Algorithm Algorithm 2 Accelerated Stochastic Mirror Descent Algorithm (Lan, 2020). Algorithm 3 Stochastic Gradient Descent (SGD) Algorithm 4 Ada Grad-Norm |
| Open Source Code | No | The paper provides a link to the full version of the paper on arXiv (https://arxiv.org/abs/2302.14843), but this is not a link to open-source code for the methodology described in the paper. There is no explicit statement about releasing code. |
| Open Datasets | No | The paper is theoretical and does not use or refer to any datasets for empirical evaluation. Therefore, no information about publicly available or open datasets is provided. |
| Dataset Splits | No | The paper is theoretical and does not describe empirical experiments or dataset usage. Therefore, no information on training, validation, or test dataset splits is provided. |
| Hardware Specification | No | The paper is purely theoretical and does not involve empirical experiments, thus no hardware specifications are mentioned. |
| Software Dependencies | No | The paper is theoretical and does not specify any software dependencies with version numbers for implementation or experimentation. |
| Experiment Setup | No | The paper is theoretical and does not describe an experimental setup with specific hyperparameters or system-level training settings. |