Natasha 2: Faster Non-Convex Optimization Than SGD
Authors: Zeyuan Allen-Zhu
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We design a stochastic algorithm to find ε-approximate local minima of any smooth nonconvex function in rate O(ε 3.25), with only oracle access to stochastic gradients. The best result before this work was O(ε 4) by stochastic gradient descent (SGD). Theorem 1 (informal). Natasha1.5 finds x with f(x) ε in rate T = O 1. Theorem 2 (informal). Natasha2 finds x with f(x) ε and 2f(x) δI in rate T = e O 1 δ5 + 1 δε3 + 1 ε3.25 . In contrast, the convergence rate of SGD was T = e O(poly(d) ε 4) [22]. Table 1: Comparison of online methods for finding f(x) ε. |
| Researcher Affiliation | Industry | Zeyuan Allen-Zhu Microsoft Research AI zeyuan@csail.mit.edu |
| Pseudocode | Yes | Algorithm 1 Natasha1.5(F, x , B, T , α) Input: f( ) = 1 n Pn i=1 fi(x), starting vector x , epoch length B [n], epoch count T 1, learning rate α > 0. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | No | The paper is theoretical and does not mention using any datasets for training or evaluation. |
| Dataset Splits | No | The paper is theoretical and does not involve dataset splits for validation or other purposes. |
| Hardware Specification | No | The paper is theoretical and does not describe any specific hardware used for experiments. |
| Software Dependencies | No | The paper is theoretical and does not provide specific ancillary software details with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not contain specific experimental setup details, hyperparameters, or training configurations. |