SSRGD: Simple Stochastic Recursive Gradient Descent for Escaping Saddle Points

Authors: Zhize Li

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We analyze stochastic gradient algorithms for optimizing nonconvex problems. In particular, our goal is to find local minima (second-order stationary points) instead of just finding first-order stationary points which may be some bad unstable saddle points. We show that a simple perturbed version of stochastic recursive gradient descent algorithm (called SSRGD) can find an (ϵ, δ)-second-order stationary point with e O( n/ϵ2 + n/δ4 + n/δ3) stochastic gradient complexity for nonconvex finite-sum problems. As a by-product, SSRGD finds an ϵ-first-order stationary point with O(n + n/ϵ2) stochastic gradients. These results are almost optimal since Fang et al. [11] provided a lower bound Ω( n/ϵ2) for finding even just an ϵ-first-order stationary point. We emphasize that SSRGD algorithm for finding second-order stationary points is as simple as for finding first-order stationary points just by adding a uniform perturbation sometimes, while all other algorithms for finding second-order stationary points with similar gradient complexity need to combine with a negative-curvature search subroutine (e.g., Neon2 [4]). Moreover, the simple SSRGD algorithm gets a simpler analysis.
Researcher Affiliation Academia Zhize Li Tsinghua University, China and KAUST, Saudi Arabia zhizeli.thu@gmail.com
Pseudocode Yes Algorithm 1 Simple Stochastic Recursive Gradient Descent (SSRGD); Algorithm 2 Simple Stochastic Recursive Gradient Descent (SSRGD)
Open Source Code No The paper does not provide any statement about releasing open-source code for the described methodology, nor does it include a link to a code repository.
Open Datasets No The paper is theoretical and discusses algorithm convergence for abstract problem forms (finite-sum and online/expectation problems). It does not mention or use any specific public or open datasets for training.
Dataset Splits No The paper is theoretical and does not describe any dataset splits (training, validation, test) as it does not conduct experiments on specific datasets.
Hardware Specification No The paper is theoretical and does not describe any experimental setup or hardware specifications (like specific GPU/CPU models, memory, or cloud resources) used for running experiments.
Software Dependencies No The paper is theoretical and does not specify any software dependencies with version numbers (e.g., libraries, frameworks, or programming languages) required to reproduce experiments.
Experiment Setup No The paper does not provide specific experimental setup details such as concrete hyperparameter values for empirical runs, training configurations, or system-level settings. It discusses theoretical parameters like step size and epoch length as part of its algorithm analysis, but not as part of an empirical setup.