The Complexity of Finding Stationary Points with Stochastic Gradient Descent
Authors: Yoel Drori, Ohad Shamir
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We study the iteration complexity of stochastic gradient descent (SGD) for minimizing the gradient norm of smooth, possibly nonconvex functions. We provide several results, implying that the classical O(ϵ 4) upper bound (for making the average gradient norm less than ϵ) cannot be improved upon, unless a combination of additional assumptions is made. Notably, this holds even if we limit ourselves to convex quadratic functions. We also show that for nonconvex functions, the feasibility of minimizing gradients with SGD is surprisingly sensitive to the choice of optimality criteria. |
| Researcher Affiliation | Collaboration | 1Google Research 2Weizmann Institute of Science. |
| Pseudocode | No | The paper describes the SGD algorithm in textual form (e.g., 'xt+1 = xt ηt ( f(xt) + ξt)'), but does not include a formally structured pseudocode block or algorithm box. |
| Open Source Code | No | The paper does not contain any statement about releasing source code for the described theoretical work, nor does it provide any links to a code repository. |
| Open Datasets | No | This is a theoretical paper and does not describe experiments involving datasets or training procedures. |
| Dataset Splits | No | This is a theoretical paper and does not involve dataset splits (training, validation, test) for empirical evaluation. |
| Hardware Specification | No | This is a theoretical paper and does not describe experiments that would require hardware specifications. |
| Software Dependencies | No | This is a theoretical paper focused on mathematical analysis, and therefore does not list software dependencies or version numbers. |
| Experiment Setup | No | This is a theoretical paper and does not describe an experimental setup, hyperparameters, or training configurations. |