reproducibilityindex.ai

SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path-Integrated Differential Estimator

Authors: Cong Fang, Chris Junchi Li, Zhouchen Lin, Tong Zhang

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	In this paper, we propose a new technique named Stochastic Path-Integrated Differential Estimato R (SPIDER), which can be used to track many deterministic quantities of interests with signiﬁcantly reduced computational cost. Combining SPIDER with the method of normalized gradient descent, we propose SPIDER-SFO that solve non-convex stochastic optimization problems using stochastic gradients only. We provide a few error-bound results on its convergence rates. Specially, we prove that the SPIDER-SFO algorithm achieves a gradient computation cost of O min(n1/2ϵ 2, ϵ 3) to ﬁnd an ϵ-approximate ﬁrst-order stationary point. In addition, we prove that SPIDER-SFO nearly matches the algorithmic lower bound for ﬁnding stationary point under the gradient Lipschitz assumption in the ﬁnite-sum setting. Our SPIDER technique can be further applied to ﬁnd an (ϵ, O(ϵ0.5))-approximate second-order stationary point at a gradient computation cost of O min(n1/2ϵ 2 + ϵ 2.5, ϵ 3).
Researcher Affiliation	Collaboration	Cong Fang1 Chris Junchi Li2 Zhouchen Lin1 Tong Zhang2 1Key Lab. of Machine Intelligence (Mo E), School of EECS, Peking University 2Tencent AI Lab {fangcong, zlin}@pku.edu.cn junchi.li.duke@gmail.com tongzhang@tongzhang-ml.org
Pseudocode	Yes	Algorithm 1 SPIDER-SFO: Input x0, q, S1, S2, n0, ϵ, and ϵ (For ﬁnding ﬁrst-order stationary point)
Open Source Code	No	The paper refers to a long version of the paper at 'https://arxiv.org/abs/1807.01695' for details, but does not provide a direct link or statement about the availability of the source code for the methodology.
Open Datasets	No	The paper does not use or specify a publicly available dataset with concrete access information. It discusses optimization problems (finite-sum and online cases) rather than empirical evaluation on specific datasets.
Dataset Splits	No	The paper is theoretical and does not involve empirical experiments with dataset splits. Therefore, it does not provide information about training/validation/test splits.
Hardware Specification	No	The paper is theoretical and does not describe any specific hardware used for running experiments.
Software Dependencies	No	The paper is theoretical and does not mention specific software dependencies with version numbers.
Experiment Setup	No	The paper is theoretical and focuses on algorithm design and convergence proofs. It does not provide details on experimental setup such as hyperparameters or system-level training settings, as no experiments are described.