Natasha 2: Faster Non-Convex Optimization Than SGD

Authors: Zeyuan Allen-Zhu

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We design a stochastic algorithm to find ε-approximate local minima of any smooth nonconvex function in rate O(ε 3.25), with only oracle access to stochastic gradients. The best result before this work was O(ε 4) by stochastic gradient descent (SGD). Theorem 1 (informal). Natasha1.5 finds x with f(x) ε in rate T = O 1. Theorem 2 (informal). Natasha2 finds x with f(x) ε and 2f(x) δI in rate T = e O 1 δ5 + 1 δε3 + 1 ε3.25 . In contrast, the convergence rate of SGD was T = e O(poly(d) ε 4) [22]. Table 1: Comparison of online methods for finding f(x) ε.
Researcher Affiliation Industry Zeyuan Allen-Zhu Microsoft Research AI zeyuan@csail.mit.edu
Pseudocode Yes Algorithm 1 Natasha1.5(F, x , B, T , α) Input: f( ) = 1 n Pn i=1 fi(x), starting vector x , epoch length B [n], epoch count T 1, learning rate α > 0.
Open Source Code No The paper does not provide concrete access to source code for the methodology described.
Open Datasets No The paper is theoretical and does not mention using any datasets for training or evaluation.
Dataset Splits No The paper is theoretical and does not involve dataset splits for validation or other purposes.
Hardware Specification No The paper is theoretical and does not describe any specific hardware used for experiments.
Software Dependencies No The paper is theoretical and does not provide specific ancillary software details with version numbers.
Experiment Setup No The paper is theoretical and does not contain specific experimental setup details, hyperparameters, or training configurations.