Heavy-ball Algorithms Always Escape Saddle Points
Authors: Tao Sun, Dongsheng Li, Zhe Quan, Hao Jiang, Shengguo Li, Yong Dou
IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this paper, we answer a question: can the nonconvex heavy-ball algorithms with random initialization avoid saddle points? The answer is yes! Direct using the existing proof technique for the heavy-ball algorithms is hard due to that each iteration of the heavy-ball algorithm consists of current and last points. It is impossible to formulate the algorithms as iteration like xk+1 = g(xk) under some mapping g. To this end, we design a new mapping on a new space. With some transfers, the heavy-ball algorithm can be interpreted as iterations after this mapping. Theoretically, we prove that heavy-ball gradient descent enjoys larger stepsize than the gradient descent to escape saddle points to escape the saddle point. |
| Researcher Affiliation | Academia | Tao Sun1 , Dongsheng Li1 , Zhe Quan2 , Hao Jiang1 , Shengguo Li1 and Yong Dou1 1 Computer of College, National University of Defense Technology 2 Hunan University nudtsuntao@163.com, dsli@nudt.edu.cn, quanzhe@hnu.edu.cn,{haojiang, nudtlsg, yongdou}@nudt.edu.cn |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. The algorithms are presented as mathematical equations. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. It is a theoretical paper with no mention of code release. |
| Open Datasets | No | The paper is theoretical and does not conduct experiments on datasets, thus it does not provide access information for a publicly available or open dataset for training. |
| Dataset Splits | No | The paper is theoretical and does not conduct experiments on datasets, thus it does not provide specific dataset split information for validation. |
| Hardware Specification | No | The paper is theoretical and does not describe any experiments that would require specific hardware. No hardware specifications are mentioned. |
| Software Dependencies | No | The paper is theoretical and does not conduct experiments, therefore it does not list specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not describe any empirical experiments, so no specific experimental setup details like hyperparameters or training configurations are provided. |