SSRGD: Simple Stochastic Recursive Gradient Descent for Escaping Saddle Points
Authors: Zhize Li
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We analyze stochastic gradient algorithms for optimizing nonconvex problems. In particular, our goal is to find local minima (second-order stationary points) instead of just finding first-order stationary points which may be some bad unstable saddle points. We show that a simple perturbed version of stochastic recursive gradient descent algorithm (called SSRGD) can find an (ϵ, δ)-second-order stationary point with e O( n/ϵ2 + n/δ4 + n/δ3) stochastic gradient complexity for nonconvex finite-sum problems. As a by-product, SSRGD finds an ϵ-first-order stationary point with O(n + n/ϵ2) stochastic gradients. These results are almost optimal since Fang et al. [11] provided a lower bound Ω( n/ϵ2) for finding even just an ϵ-first-order stationary point. We emphasize that SSRGD algorithm for finding second-order stationary points is as simple as for finding first-order stationary points just by adding a uniform perturbation sometimes, while all other algorithms for finding second-order stationary points with similar gradient complexity need to combine with a negative-curvature search subroutine (e.g., Neon2 [4]). Moreover, the simple SSRGD algorithm gets a simpler analysis. |
| Researcher Affiliation | Academia | Zhize Li Tsinghua University, China and KAUST, Saudi Arabia zhizeli.thu@gmail.com |
| Pseudocode | Yes | Algorithm 1 Simple Stochastic Recursive Gradient Descent (SSRGD); Algorithm 2 Simple Stochastic Recursive Gradient Descent (SSRGD) |
| Open Source Code | No | The paper does not provide any statement about releasing open-source code for the described methodology, nor does it include a link to a code repository. |
| Open Datasets | No | The paper is theoretical and discusses algorithm convergence for abstract problem forms (finite-sum and online/expectation problems). It does not mention or use any specific public or open datasets for training. |
| Dataset Splits | No | The paper is theoretical and does not describe any dataset splits (training, validation, test) as it does not conduct experiments on specific datasets. |
| Hardware Specification | No | The paper is theoretical and does not describe any experimental setup or hardware specifications (like specific GPU/CPU models, memory, or cloud resources) used for running experiments. |
| Software Dependencies | No | The paper is theoretical and does not specify any software dependencies with version numbers (e.g., libraries, frameworks, or programming languages) required to reproduce experiments. |
| Experiment Setup | No | The paper does not provide specific experimental setup details such as concrete hyperparameter values for empirical runs, training configurations, or system-level settings. It discusses theoretical parameters like step size and epoch length as part of its algorithm analysis, but not as part of an empirical setup. |