reproducibilityindex.ai

SSRGD: Simple Stochastic Recursive Gradient Descent for Escaping Saddle Points

Authors: Zhize Li

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We analyze stochastic gradient algorithms for optimizing nonconvex problems. In particular, our goal is to ﬁnd local minima (second-order stationary points) instead of just ﬁnding ﬁrst-order stationary points which may be some bad unstable saddle points. We show that a simple perturbed version of stochastic recursive gradient descent algorithm (called SSRGD) can ﬁnd an (ϵ, δ)-second-order stationary point with e O( n/ϵ2 + n/δ4 + n/δ3) stochastic gradient complexity for nonconvex ﬁnite-sum problems. As a by-product, SSRGD ﬁnds an ϵ-ﬁrst-order stationary point with O(n + n/ϵ2) stochastic gradients. These results are almost optimal since Fang et al. [11] provided a lower bound Ω( n/ϵ2) for ﬁnding even just an ϵ-ﬁrst-order stationary point. We emphasize that SSRGD algorithm for ﬁnding second-order stationary points is as simple as for ﬁnding ﬁrst-order stationary points just by adding a uniform perturbation sometimes, while all other algorithms for ﬁnding second-order stationary points with similar gradient complexity need to combine with a negative-curvature search subroutine (e.g., Neon2 [4]). Moreover, the simple SSRGD algorithm gets a simpler analysis.
Researcher Affiliation	Academia	Zhize Li Tsinghua University, China and KAUST, Saudi Arabia zhizeli.thu@gmail.com
Pseudocode	Yes	Algorithm 1 Simple Stochastic Recursive Gradient Descent (SSRGD); Algorithm 2 Simple Stochastic Recursive Gradient Descent (SSRGD)
Open Source Code	No	The paper does not provide any statement about releasing open-source code for the described methodology, nor does it include a link to a code repository.
Open Datasets	No	The paper is theoretical and discusses algorithm convergence for abstract problem forms (finite-sum and online/expectation problems). It does not mention or use any specific public or open datasets for training.
Dataset Splits	No	The paper is theoretical and does not describe any dataset splits (training, validation, test) as it does not conduct experiments on specific datasets.
Hardware Specification	No	The paper is theoretical and does not describe any experimental setup or hardware specifications (like specific GPU/CPU models, memory, or cloud resources) used for running experiments.
Software Dependencies	No	The paper is theoretical and does not specify any software dependencies with version numbers (e.g., libraries, frameworks, or programming languages) required to reproduce experiments.
Experiment Setup	No	The paper does not provide specific experimental setup details such as concrete hyperparameter values for empirical runs, training configurations, or system-level settings. It discusses theoretical parameters like step size and epoch length as part of its algorithm analysis, but not as part of an empirical setup.