How to Escape Saddle Points Efficiently
Authors: Chi Jin, Rong Ge, Praneeth Netrapalli, Sham M. Kakade, Michael I. Jordan
ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | This paper shows that a perturbed form of gradient descent converges to a second-order stationary point in a number iterations which depends only poly-logarithmically on dimension (i.e., it is almost dimension-free ). The convergence rate of this procedure matches the well-known convergence rate of gradient descent to first-order stationary points, up to log factors. This paper presents the first sharp analysis that shows that (perturbed) gradient descent finds an approximate second-order stationary point in at most polylog(d) iterations, thus escaping all saddle points efficiently. |
| Researcher Affiliation | Collaboration | 1University of California, Berkeley 2Duke University 3Microsoft Research India 4University of Washington. |
| Pseudocode | Yes | Algorithm 1 Perturbed Gradient Descent (Meta-algorithm); Algorithm 2 Perturbed Gradient Descent: PGD(x0, ℓ, ρ, ϵ, c, δ, f); Algorithm 3 Perturbed Gradient Descent with Local Improvement: PGDli(x0, ℓ, ρ, ϵ, c, δ, f, β) |
| Open Source Code | No | The paper does not provide any statement about making its source code available or provide a link to a code repository. |
| Open Datasets | No | The paper is theoretical and does not conduct empirical experiments involving datasets or training. The example of Matrix Factorization is used for theoretical analysis, not empirical evaluation. |
| Dataset Splits | No | The paper is theoretical and does not describe empirical experiments with data splits for training, validation, or testing. |
| Hardware Specification | No | The paper is theoretical and does not describe any empirical experiments, therefore no specific hardware specifications are mentioned. |
| Software Dependencies | No | The paper focuses on theoretical analysis and algorithm design, not on implementation details that would require specific software dependencies with version numbers. |
| Experiment Setup | No | The paper focuses on theoretical analysis and does not describe an experimental setup with hyperparameters or system-level training settings. |