Gradient Descent Can Take Exponential Time to Escape Saddle Points
Authors: Simon S. Du, Chi Jin, Jason D. Lee, Michael I. Jordan, Aarti Singh, Barnabas Poczos
NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | While our focus is theoretical, we also present experiments that illustrate our theoretical findings. |
| Researcher Affiliation | Academia | Simon S. Du Carnegie Mellon University ssdu@cs.cmu.edu Chi Jin University of California, Berkeley chijin@berkeley.edu Jason D. Lee University of Southern California jasonlee@marshall.usc.edu Michael I. Jordan University of California, Berkeley jordan@cs.berkeley.edu Barnabás Póczos Carnegie Mellon University bapoczos@cs.cmu.edu Aarti Singh Carnegie Mellon University aartisingh@cmu.edu |
| Pseudocode | Yes | Algorithm 1 Perturbed Gradient Descent [Jin et al., 2017] |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code or links to a code repository. |
| Open Datasets | No | The paper defines an objective function for its experiments (equations 14 and 15 in the Appendix) rather than using a publicly available dataset. Therefore, no concrete access information for a dataset is provided. |
| Dataset Splits | No | The paper uses a custom-defined objective function for its experiments, not a standard dataset. Therefore, no information on training/validation/test splits is provided. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for running the experiments. |
| Software Dependencies | No | The paper mentions parameters like stepsize, tthres, gthres, and r, but does not specify any software names with version numbers used for the experiments. |
| Experiment Setup | Yes | For both GD and PGD we let the stepsize η = 1 4L. For PGD, we choose tthres = 1, gthres = γe 100 and r = e 100. In Figure 3 we fix dimension d = 5 and vary L as considered in Section 4.1; similarly in Figure 4 we choose d = 10 and vary L. |