Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Almost Sure Convergence Rates Analysis and Saddle Avoidance of Stochastic Gradient Methods

Authors: Jun Liu, Ye Yuan

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical The vast majority of convergence rates analysis for stochastic gradient methods in the literature focus on convergence in expectation, whereas trajectory-wise almost sure convergence is clearly important to ensure that any instantiation of the stochastic algorithms would converge with probability one. Here we provide a unified almost sure convergence rates analysis for stochastic gradient descent (SGD), stochastic heavy-ball (SHB), and stochastic Nesterov s accelerated gradient (SNAG) methods. We show, for the first time, that the almost sure convergence rates obtained for these stochastic gradient methods on strongly convex functions, are arbitrarily close to their optimal convergence rates possible. For non-convex objective functions, we not only show that a weighted average of the squared gradient norms converges to zero almost surely, but also the last iterates of the algorithms. We further provide last-iterate almost sure convergence rates analysis for stochastic gradient methods on general convex smooth functions, in contrast with most existing results in the literature that only provide convergence in expectation for a weighted average of the iterates. The last-iterate almost sure convergence results also enable us to obtain almost sure avoidance of any strict saddle manifold by stochastic gradient methods with or without momentum.
Researcher Affiliation Academia Jun Liu EMAIL Department of Applied Mathematics, University of Waterloo, Waterloo, Canada; Ye Yuan EMAIL School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, China
Pseudocode Yes The iteration of the SGD method is given by xt+1 = xt αtgt, t 1, (11)... The iteration of the SHB method is given by xt+1 = xt αtgt + β(xt xt 1), t 1, (16)... The iteration of the SNAG method is given by yt+1 = xt αtgt, xt+1 = yt+1 + β(yt+1 yt), t 1, (24)
Open Source Code No The paper does not contain any explicit statements about the release of open-source code, nor does it provide links to any code repositories.
Open Datasets No The paper refers to popular non-convex optimization benchmark functions like the Griewank, Rastrigin, and Levy functions, and provides a general URL (https://www.sfu.ca/~ssurjano/optimization.html) describing them. However, it does not use or provide access information for any specific experimental datasets for empirical evaluation.
Dataset Splits No As the paper is theoretical and does not describe experiments using specific datasets, there is no mention of training/test/validation dataset splits.
Hardware Specification No The paper does not describe any experimental results or the hardware used for computation. It is a theoretical paper focusing on convergence analysis.
Software Dependencies No The paper does not specify any software dependencies or version numbers, consistent with its theoretical nature.
Experiment Setup No The paper focuses on theoretical analysis and does not describe any empirical experimental setup, hyperparameters, or training configurations.