Escaping Saddle-Point Faster under Interpolation-like Conditions

Authors: Abhishek Roy, Krishnakumar Balasubramanian, Saeed Ghadimi, Prasant Mohapatra

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this paper, we show that under over-parametrization several standard stochastic optimization algorithms escape saddle-points and converge to local-minimizers much faster. One of the fundamental aspects of over-parametrized models is that they are capable of interpolating the training data. We show that, under interpolation-like assumptions... the first-order oracle complexity of Perturbed Stochastic Gradient Descent (PSGD) algorithm to reach an -local-minimizer, matches the corresponding deterministic rate of O(1/ 2).
Researcher Affiliation Academia Abhiskek Roy Department of Statistics University of California, Davis abroy@ucdavis.edu Krishnakumar Balasubramanian Department of Statistics University of California, Davis kbala@ucdavis.edu Saeed Ghadimi Department of Management Sciences University of Waterloo sghadimi@uwaterloo.ca Prasant Mohapatra Department of Computer Science University of California, Davis pmohapatra@ucdavis.edu
Pseudocode Yes Algorithm 1 Perturbed Stochastic Gradient Descent Algorithm
Open Source Code No The paper is theoretical and does not mention releasing any source code or providing links to a code repository for the described methodology.
Open Datasets No The paper is theoretical and does not involve empirical training on datasets. No public dataset information is provided.
Dataset Splits No The paper is theoretical and does not involve empirical validation on datasets. No information on training, validation, or test splits is provided.
Hardware Specification No The paper is theoretical and does not describe any experimental hardware specifications.
Software Dependencies No The paper is theoretical and does not mention any software dependencies with specific version numbers.
Experiment Setup No The paper is theoretical and does not describe any empirical experiment setup details such as hyperparameters or training configurations.