Heavy-ball Algorithms Always Escape Saddle Points

Authors: Tao Sun, Dongsheng Li, Zhe Quan, Hao Jiang, Shengguo Li, Yong Dou

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this paper, we answer a question: can the nonconvex heavy-ball algorithms with random initialization avoid saddle points? The answer is yes! Direct using the existing proof technique for the heavy-ball algorithms is hard due to that each iteration of the heavy-ball algorithm consists of current and last points. It is impossible to formulate the algorithms as iteration like xk+1 = g(xk) under some mapping g. To this end, we design a new mapping on a new space. With some transfers, the heavy-ball algorithm can be interpreted as iterations after this mapping. Theoretically, we prove that heavy-ball gradient descent enjoys larger stepsize than the gradient descent to escape saddle points to escape the saddle point.
Researcher Affiliation Academia Tao Sun1 , Dongsheng Li1 , Zhe Quan2 , Hao Jiang1 , Shengguo Li1 and Yong Dou1 1 Computer of College, National University of Defense Technology 2 Hunan University nudtsuntao@163.com, dsli@nudt.edu.cn, quanzhe@hnu.edu.cn,{haojiang, nudtlsg, yongdou}@nudt.edu.cn
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks. The algorithms are presented as mathematical equations.
Open Source Code No The paper does not provide concrete access to source code for the methodology described. It is a theoretical paper with no mention of code release.
Open Datasets No The paper is theoretical and does not conduct experiments on datasets, thus it does not provide access information for a publicly available or open dataset for training.
Dataset Splits No The paper is theoretical and does not conduct experiments on datasets, thus it does not provide specific dataset split information for validation.
Hardware Specification No The paper is theoretical and does not describe any experiments that would require specific hardware. No hardware specifications are mentioned.
Software Dependencies No The paper is theoretical and does not conduct experiments, therefore it does not list specific software dependencies with version numbers.
Experiment Setup No The paper is theoretical and does not describe any empirical experiments, so no specific experimental setup details like hyperparameters or training configurations are provided.