Natasha: Faster Non-Convex Stochastic Optimization via Strongly Non-Convex Parameter
Authors: Zeyuan Allen-Zhu
ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | Given a non-convex function f(x) that is an average of n smooth functions, we design stochastic first-order methods to find its approximate stationary points. The performance of our new methods depend on the smallest (negative) eigenvalue σ of the Hessian. This parameter σ captures how strongly non-convex f(x) is, and is analogous to the strong convexity parameter for convex optimization. At least in theory, our methods outperform known results for a range of parameter σ, and can also be used to find approximate local minima. Our result implies an interesting dichotomy: there exists a threshold σ0 so that the (currently) fastest methods for σ > σ0 and for σ < σ0 have different behaviors: the former scales with n2/3 and the latter scales with n3/4. |
| Researcher Affiliation | Collaboration | Microsoft Research. Correspondence to: Zeyuan Allen-Zhu <zeyuan@csail.mit.edu>. |
| Pseudocode | Yes | Algorithm 1 Natasha(x , p, T , α) |
| Open Source Code | No | The paper does not provide concrete access to source code (specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described in this paper. |
| Open Datasets | No | This is a theoretical paper and does not involve empirical studies with specific datasets or their public availability. |
| Dataset Splits | No | This is a theoretical paper and does not involve empirical studies with specific dataset splits. |
| Hardware Specification | No | This is a theoretical paper and does not mention specific hardware used for running experiments. |
| Software Dependencies | No | This is a theoretical paper and does not mention specific software dependencies with version numbers. |
| Experiment Setup | No | This is a theoretical paper and does not describe an experimental setup with specific hyperparameters or training configurations. |