Escaping Saddle-Point Faster under Interpolation-like Conditions
Authors: Abhishek Roy, Krishnakumar Balasubramanian, Saeed Ghadimi, Prasant Mohapatra
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this paper, we show that under over-parametrization several standard stochastic optimization algorithms escape saddle-points and converge to local-minimizers much faster. One of the fundamental aspects of over-parametrized models is that they are capable of interpolating the training data. We show that, under interpolation-like assumptions... the first-order oracle complexity of Perturbed Stochastic Gradient Descent (PSGD) algorithm to reach an -local-minimizer, matches the corresponding deterministic rate of O(1/ 2). |
| Researcher Affiliation | Academia | Abhiskek Roy Department of Statistics University of California, Davis abroy@ucdavis.edu Krishnakumar Balasubramanian Department of Statistics University of California, Davis kbala@ucdavis.edu Saeed Ghadimi Department of Management Sciences University of Waterloo sghadimi@uwaterloo.ca Prasant Mohapatra Department of Computer Science University of California, Davis pmohapatra@ucdavis.edu |
| Pseudocode | Yes | Algorithm 1 Perturbed Stochastic Gradient Descent Algorithm |
| Open Source Code | No | The paper is theoretical and does not mention releasing any source code or providing links to a code repository for the described methodology. |
| Open Datasets | No | The paper is theoretical and does not involve empirical training on datasets. No public dataset information is provided. |
| Dataset Splits | No | The paper is theoretical and does not involve empirical validation on datasets. No information on training, validation, or test splits is provided. |
| Hardware Specification | No | The paper is theoretical and does not describe any experimental hardware specifications. |
| Software Dependencies | No | The paper is theoretical and does not mention any software dependencies with specific version numbers. |
| Experiment Setup | No | The paper is theoretical and does not describe any empirical experiment setup details such as hyperparameters or training configurations. |