reproducibilityindex.ai

On the Local Minima of the Empirical Risk

Authors: Chi Jin, Lydia T. Liu, Rong Ge, Michael I. Jordan

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	In this paper, we answer both questions in the afﬁrmative, establishing optimal dependencies between the error ν and the precision of a solution ϵ. We propose a simple algorithm based on SGD (Algorithm 1) that is guaranteed to ﬁnd an approximate local minimum of F efﬁciently if ν O(ϵ1.5/d), thus escaping all saddle points of F and all additional local minima introduced by f. Moreover, we provide a matching lower bound (up to logarithmic factors) for all algorithms making a polynomial number of queries of f. The lower bound shows that our algorithm achieves the optimal tradeoff between ν and ϵ, as well as the optimal dependence on dimension d. We also consider the information-theoretic limit for identifying an approximate local minimum of F regardless of the number of queries. We give a sharp information-theoretic threshold: ν = Θ(ϵ1.5) (see Figure 2). As a concrete example of the application to minimizing population risk, we show that our results can be directly used to give sample complexities for learning a Re LU unit, whose empirical risk is nonsmooth while the population risk is smooth almost everywhere.
Researcher Affiliation	Academia	Chi Jin University of California, Berkeley chijin@cs.berkeley.edu Lydia T. Liu University of California, Berkeley lydiatliu@cs.berkeley.edu Rong Ge Duke University rongge@cs.duke.edu Michael I. Jordan University of California, Berkeley jordan@cs.berkeley.edu
Pseudocode	Yes	Algorithm 1 Zero-th order Perturbed Stochastic Gradient Descent (ZPSGD)
Open Source Code	No	The paper does not contain any explicit statement about making source code available or provide links to repositories.
Open Datasets	No	The paper describes a theoretical example of learning a Re LU unit where data is assumed to be generated from a specified distribution (
Dataset Splits	No	The paper is theoretical and does not conduct empirical experiments that would involve dataset splits for training, validation, or testing.
Hardware Specification	No	The paper is theoretical and focuses on algorithmic guarantees and lower bounds. It does not describe any specific hardware used for running experiments.
Software Dependencies	No	The paper is theoretical and does not mention any specific software dependencies with version numbers that would be required to reproduce the work.
Experiment Setup	No	The paper is theoretical and does not describe an empirical experimental setup with concrete hyperparameter values or system-level training settings. The parameters for Algorithm 1 (ZPSGD) are described in terms of theoretical relationships (e.g., learning rate η = 1/ℓ, σ = Θ(pϵ/(ρd))) rather than fixed numerical values for a specific experiment.