Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Natasha 2: Faster Non-Convex Optimization Than SGD
Authors: Zeyuan Allen-Zhu
NeurIPS 2018 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We design a stochastic algorithm to find ε-approximate local minima of any smooth nonconvex function in rate O(ε 3.25), with only oracle access to stochastic gradients. The best result before this work was O(ε 4) by stochastic gradient descent (SGD). Theorem 1 (informal). Natasha1.5 finds x with f(x) ε in rate T = O 1. Theorem 2 (informal). Natasha2 finds x with f(x) ε and 2f(x) δI in rate T = e O 1 δ5 + 1 δε3 + 1 ε3.25 . In contrast, the convergence rate of SGD was T = e O(poly(d) ε 4) [22]. Table 1: Comparison of online methods for finding f(x) ε. |
| Researcher Affiliation | Industry | Zeyuan Allen-Zhu Microsoft Research AI EMAIL |
| Pseudocode | Yes | Algorithm 1 Natasha1.5(F, x , B, T , α) Input: f( ) = 1 n Pn i=1 fi(x), starting vector x , epoch length B [n], epoch count T 1, learning rate α > 0. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | No | The paper is theoretical and does not mention using any datasets for training or evaluation. |
| Dataset Splits | No | The paper is theoretical and does not involve dataset splits for validation or other purposes. |
| Hardware Specification | No | The paper is theoretical and does not describe any specific hardware used for experiments. |
| Software Dependencies | No | The paper is theoretical and does not provide specific ancillary software details with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not contain specific experimental setup details, hyperparameters, or training configurations. |