SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path-Integrated Differential Estimator
Authors: Cong Fang, Chris Junchi Li, Zhouchen Lin, Tong Zhang
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this paper, we propose a new technique named Stochastic Path-Integrated Differential Estimato R (SPIDER), which can be used to track many deterministic quantities of interests with significantly reduced computational cost. Combining SPIDER with the method of normalized gradient descent, we propose SPIDER-SFO that solve non-convex stochastic optimization problems using stochastic gradients only. We provide a few error-bound results on its convergence rates. Specially, we prove that the SPIDER-SFO algorithm achieves a gradient computation cost of O min(n1/2ϵ 2, ϵ 3) to find an ϵ-approximate first-order stationary point. In addition, we prove that SPIDER-SFO nearly matches the algorithmic lower bound for finding stationary point under the gradient Lipschitz assumption in the finite-sum setting. Our SPIDER technique can be further applied to find an (ϵ, O(ϵ0.5))-approximate second-order stationary point at a gradient computation cost of O min(n1/2ϵ 2 + ϵ 2.5, ϵ 3). |
| Researcher Affiliation | Collaboration | Cong Fang1 Chris Junchi Li2 Zhouchen Lin1 Tong Zhang2 1Key Lab. of Machine Intelligence (Mo E), School of EECS, Peking University 2Tencent AI Lab {fangcong, zlin}@pku.edu.cn junchi.li.duke@gmail.com tongzhang@tongzhang-ml.org |
| Pseudocode | Yes | Algorithm 1 SPIDER-SFO: Input x0, q, S1, S2, n0, ϵ, and ϵ (For finding first-order stationary point) |
| Open Source Code | No | The paper refers to a long version of the paper at 'https://arxiv.org/abs/1807.01695' for details, but does not provide a direct link or statement about the availability of the source code for the methodology. |
| Open Datasets | No | The paper does not use or specify a publicly available dataset with concrete access information. It discusses optimization problems (finite-sum and online cases) rather than empirical evaluation on specific datasets. |
| Dataset Splits | No | The paper is theoretical and does not involve empirical experiments with dataset splits. Therefore, it does not provide information about training/validation/test splits. |
| Hardware Specification | No | The paper is theoretical and does not describe any specific hardware used for running experiments. |
| Software Dependencies | No | The paper is theoretical and does not mention specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and focuses on algorithm design and convergence proofs. It does not provide details on experimental setup such as hyperparameters or system-level training settings, as no experiments are described. |