SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path-Integrated Differential Estimator

Authors: Cong Fang, Chris Junchi Li, Zhouchen Lin, Tong Zhang

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this paper, we propose a new technique named Stochastic Path-Integrated Differential Estimato R (SPIDER), which can be used to track many deterministic quantities of interests with significantly reduced computational cost. Combining SPIDER with the method of normalized gradient descent, we propose SPIDER-SFO that solve non-convex stochastic optimization problems using stochastic gradients only. We provide a few error-bound results on its convergence rates. Specially, we prove that the SPIDER-SFO algorithm achieves a gradient computation cost of O min(n1/2ϵ 2, ϵ 3) to find an ϵ-approximate first-order stationary point. In addition, we prove that SPIDER-SFO nearly matches the algorithmic lower bound for finding stationary point under the gradient Lipschitz assumption in the finite-sum setting. Our SPIDER technique can be further applied to find an (ϵ, O(ϵ0.5))-approximate second-order stationary point at a gradient computation cost of O min(n1/2ϵ 2 + ϵ 2.5, ϵ 3).
Researcher Affiliation Collaboration Cong Fang1 Chris Junchi Li2 Zhouchen Lin1 Tong Zhang2 1Key Lab. of Machine Intelligence (Mo E), School of EECS, Peking University 2Tencent AI Lab {fangcong, zlin}@pku.edu.cn junchi.li.duke@gmail.com tongzhang@tongzhang-ml.org
Pseudocode Yes Algorithm 1 SPIDER-SFO: Input x0, q, S1, S2, n0, ϵ, and ϵ (For finding first-order stationary point)
Open Source Code No The paper refers to a long version of the paper at 'https://arxiv.org/abs/1807.01695' for details, but does not provide a direct link or statement about the availability of the source code for the methodology.
Open Datasets No The paper does not use or specify a publicly available dataset with concrete access information. It discusses optimization problems (finite-sum and online cases) rather than empirical evaluation on specific datasets.
Dataset Splits No The paper is theoretical and does not involve empirical experiments with dataset splits. Therefore, it does not provide information about training/validation/test splits.
Hardware Specification No The paper is theoretical and does not describe any specific hardware used for running experiments.
Software Dependencies No The paper is theoretical and does not mention specific software dependencies with version numbers.
Experiment Setup No The paper is theoretical and focuses on algorithm design and convergence proofs. It does not provide details on experimental setup such as hyperparameters or system-level training settings, as no experiments are described.