Convergence of Cubic Regularization for Nonconvex Optimization under KL Property

Authors: Yi Zhou, Zhe Wang, Yingbin Liang

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this paper, we explore the asymptotic convergence rate of CR by exploiting the ubiquitous Kurdyka-Łojasiewicz (KŁ) property of nonconvex objective functions. In specific, we characterize the asymptotic convergence rate of various types of optimality measures for CR including function value gap, variable distance gap, gradient norm and least eigenvalue of the Hessian matrix. Our results fully characterize the diverse convergence behaviors of these optimality measures in the full parameter regime of the KŁ property. Moreover, we show that the obtained asymptotic convergence rates of CR are order-wise faster than those of first-order gradient descent algorithms under the KŁ property.
Researcher Affiliation Academia Yi Zhou Department of ECE The Ohio State University zhou.1172@osu.edu Zhe Wang Department of ECE The Ohio State University wang.10982@osu.edu Yingbin Liang Department of ECE The Ohio State University liang.889@osu.edu
Pseudocode No No pseudocode or clearly labeled algorithm block was found. The algorithm update rule is presented as a mathematical equation (1).
Open Source Code No The paper does not contain any statement or link indicating that open-source code for the described methodology is provided.
Open Datasets No The paper is purely theoretical and does not involve experiments with datasets.
Dataset Splits No The paper is purely theoretical and does not involve experiments with datasets or specify any training/validation/test splits.
Hardware Specification No The paper describes theoretical work and does not mention any specific hardware used for experiments.
Software Dependencies No The paper describes theoretical work and does not specify any software dependencies with version numbers.
Experiment Setup No The paper is theoretical and does not include details on experimental setup such as hyperparameters or system-level training settings.