How to Escape Saddle Points Efficiently

Authors: Chi Jin, Rong Ge, Praneeth Netrapalli, Sham M. Kakade, Michael I. Jordan

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical This paper shows that a perturbed form of gradient descent converges to a second-order stationary point in a number iterations which depends only poly-logarithmically on dimension (i.e., it is almost dimension-free ). The convergence rate of this procedure matches the well-known convergence rate of gradient descent to first-order stationary points, up to log factors. This paper presents the first sharp analysis that shows that (perturbed) gradient descent finds an approximate second-order stationary point in at most polylog(d) iterations, thus escaping all saddle points efficiently.
Researcher Affiliation Collaboration 1University of California, Berkeley 2Duke University 3Microsoft Research India 4University of Washington.
Pseudocode Yes Algorithm 1 Perturbed Gradient Descent (Meta-algorithm); Algorithm 2 Perturbed Gradient Descent: PGD(x0, ℓ, ρ, ϵ, c, δ, f); Algorithm 3 Perturbed Gradient Descent with Local Improvement: PGDli(x0, ℓ, ρ, ϵ, c, δ, f, β)
Open Source Code No The paper does not provide any statement about making its source code available or provide a link to a code repository.
Open Datasets No The paper is theoretical and does not conduct empirical experiments involving datasets or training. The example of Matrix Factorization is used for theoretical analysis, not empirical evaluation.
Dataset Splits No The paper is theoretical and does not describe empirical experiments with data splits for training, validation, or testing.
Hardware Specification No The paper is theoretical and does not describe any empirical experiments, therefore no specific hardware specifications are mentioned.
Software Dependencies No The paper focuses on theoretical analysis and algorithm design, not on implementation details that would require specific software dependencies with version numbers.
Experiment Setup No The paper focuses on theoretical analysis and does not describe an experimental setup with hyperparameters or system-level training settings.