On the Local Minima of the Empirical Risk

Authors: Chi Jin, Lydia T. Liu, Rong Ge, Michael I. Jordan

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this paper, we answer both questions in the affirmative, establishing optimal dependencies between the error ν and the precision of a solution ϵ. We propose a simple algorithm based on SGD (Algorithm 1) that is guaranteed to find an approximate local minimum of F efficiently if ν O(ϵ1.5/d), thus escaping all saddle points of F and all additional local minima introduced by f. Moreover, we provide a matching lower bound (up to logarithmic factors) for all algorithms making a polynomial number of queries of f. The lower bound shows that our algorithm achieves the optimal tradeoff between ν and ϵ, as well as the optimal dependence on dimension d. We also consider the information-theoretic limit for identifying an approximate local minimum of F regardless of the number of queries. We give a sharp information-theoretic threshold: ν = Θ(ϵ1.5) (see Figure 2). As a concrete example of the application to minimizing population risk, we show that our results can be directly used to give sample complexities for learning a Re LU unit, whose empirical risk is nonsmooth while the population risk is smooth almost everywhere.
Researcher Affiliation Academia Chi Jin University of California, Berkeley chijin@cs.berkeley.edu Lydia T. Liu University of California, Berkeley lydiatliu@cs.berkeley.edu Rong Ge Duke University rongge@cs.duke.edu Michael I. Jordan University of California, Berkeley jordan@cs.berkeley.edu
Pseudocode Yes Algorithm 1 Zero-th order Perturbed Stochastic Gradient Descent (ZPSGD)
Open Source Code No The paper does not contain any explicit statement about making source code available or provide links to repositories.
Open Datasets No The paper describes a theoretical example of learning a Re LU unit where data is assumed to be generated from a specified distribution (
Dataset Splits No The paper is theoretical and does not conduct empirical experiments that would involve dataset splits for training, validation, or testing.
Hardware Specification No The paper is theoretical and focuses on algorithmic guarantees and lower bounds. It does not describe any specific hardware used for running experiments.
Software Dependencies No The paper is theoretical and does not mention any specific software dependencies with version numbers that would be required to reproduce the work.
Experiment Setup No The paper is theoretical and does not describe an empirical experimental setup with concrete hyperparameter values or system-level training settings. The parameters for Algorithm 1 (ZPSGD) are described in terms of theoretical relationships (e.g., learning rate η = 1/ℓ, σ = Θ(pϵ/(ρd))) rather than fixed numerical values for a specific experiment.