On the Local Minima of the Empirical Risk
Authors: Chi Jin, Lydia T. Liu, Rong Ge, Michael I. Jordan
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this paper, we answer both questions in the affirmative, establishing optimal dependencies between the error ν and the precision of a solution ϵ. We propose a simple algorithm based on SGD (Algorithm 1) that is guaranteed to find an approximate local minimum of F efficiently if ν O(ϵ1.5/d), thus escaping all saddle points of F and all additional local minima introduced by f. Moreover, we provide a matching lower bound (up to logarithmic factors) for all algorithms making a polynomial number of queries of f. The lower bound shows that our algorithm achieves the optimal tradeoff between ν and ϵ, as well as the optimal dependence on dimension d. We also consider the information-theoretic limit for identifying an approximate local minimum of F regardless of the number of queries. We give a sharp information-theoretic threshold: ν = Θ(ϵ1.5) (see Figure 2). As a concrete example of the application to minimizing population risk, we show that our results can be directly used to give sample complexities for learning a Re LU unit, whose empirical risk is nonsmooth while the population risk is smooth almost everywhere. |
| Researcher Affiliation | Academia | Chi Jin University of California, Berkeley chijin@cs.berkeley.edu Lydia T. Liu University of California, Berkeley lydiatliu@cs.berkeley.edu Rong Ge Duke University rongge@cs.duke.edu Michael I. Jordan University of California, Berkeley jordan@cs.berkeley.edu |
| Pseudocode | Yes | Algorithm 1 Zero-th order Perturbed Stochastic Gradient Descent (ZPSGD) |
| Open Source Code | No | The paper does not contain any explicit statement about making source code available or provide links to repositories. |
| Open Datasets | No | The paper describes a theoretical example of learning a Re LU unit where data is assumed to be generated from a specified distribution ( |
| Dataset Splits | No | The paper is theoretical and does not conduct empirical experiments that would involve dataset splits for training, validation, or testing. |
| Hardware Specification | No | The paper is theoretical and focuses on algorithmic guarantees and lower bounds. It does not describe any specific hardware used for running experiments. |
| Software Dependencies | No | The paper is theoretical and does not mention any specific software dependencies with version numbers that would be required to reproduce the work. |
| Experiment Setup | No | The paper is theoretical and does not describe an empirical experimental setup with concrete hyperparameter values or system-level training settings. The parameters for Algorithm 1 (ZPSGD) are described in terms of theoretical relationships (e.g., learning rate η = 1/ℓ, σ = Θ(pϵ/(ρd))) rather than fixed numerical values for a specific experiment. |