reproducibilityindex.ai

Global Convergence of Langevin Dynamics Based Algorithms for Nonconvex Optimization

Authors: Pan Xu, Jinghui Chen, Difan Zou, Quanquan Gu

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We present a uniﬁed framework to analyze the global convergence of Langevin dynamics based algorithms for nonconvex ﬁnite-sum optimization with n component functions. At the core of our analysis is a direct analysis of the ergodicity of the numerical approximations to Langevin dynamics, which leads to faster convergence rates. Speciﬁcally, we show that gradient Langevin dynamics (GLD) and stochastic gradient Langevin dynamics (SGLD) converge to the almost minimizer2 within e O stochastic gradient evaluations respectively3, where d is the problem dimension, and λ is the spectral gap of the Markov chain generated by GLD. Both results improve upon the best known gradient complexity4 results [45]. Furthermore, for the ﬁrst time we prove the global convergence guarantee for variance reduced stochastic gradient Langevin dynamics (SVRG-LD) to the almost minimizer within e O "pnd5/(λ4 5/2) stochastic gradient evaluations, which outperforms the gradient complexities of GLD and SGLD in a wide regime. Our theoretical analyses shed some light on using Langevin dynamics based algorithms for nonconvex optimization with provable guarantees.
Researcher Affiliation	Academia	Pan Xu Department of Computer Science UCLA Los Angeles, CA 90095 panxu@cs.ucla.edu Jinghui Chen Department of Computer Science University of Virginia Charlottesville, VA 22903 jc4zg@virginia.edu Difan Zou Department of Computer Science UCLA Los Angeles, CA 90095 knowzou@cs.ucla.edu Quanquan Gu Department of Computer Science UCLA Los Angeles, CA 90095 qgu@cs.ucla.edu
Pseudocode	Yes	Algorithm 1 Gradient Langevin Dynamics (GLD) input: step size > 0; inverse temperature parameter β > 0; X0 = 0 for k = 0, 1, . . . , K 1 do randomly draw k N(0, Id d) Xk+1 = Xk r Fn(Xk) + 2 /β k end for Algorithm 2 Stochastic Gradient Langevin Dynamics (SGLD) input: step size > 0; batch size B; inverse temperature parameter β > 0; Y0 = 0 for k = 0, 1, . . . , K 1 do randomly pick a subset Ik from {1, . . . , n} of size \|Ik\| = B; randomly draw k N(0, Id d) Yk+1 = Yk /B P i2Ik rfi(Yk) + 2 /β k end for Algorithm 3 Stochastic Variance Reduced Gradient Langevin Dynamics (SVRG-LD) input: step size > 0; batch size B; epoch length L; inverse temperature parameter β > 0 initialization: Z0 = 0, e Z(0) = Z0 for s = 0, 1, . . . , (K/L) 1 do f W = r Fn( e Z(s)) for = 0, . . . , L 1 do k = s L + randomly pick a subset Ik from {1, . . . , n} of size \|Ik\| = B; draw k N(0, Id d) erk = 1/B P rfik(Zk) rfik( e Z(s)) + f W Zk+1 = Zk erk + 2 /β k end for e Z(s) = Z(s+1)L end for
Open Source Code	No	The paper does not mention or provide any links to open-source code for the described methodology.
Open Datasets	No	The paper is theoretical and focuses on mathematical analysis of algorithms; it does not involve empirical training on datasets. Therefore, no information about publicly available datasets for training is provided.
Dataset Splits	No	The paper is theoretical and does not involve empirical experiments with datasets; therefore, no information about training/test/validation splits is provided.
Hardware Specification	No	The paper is theoretical and does not describe any empirical experiments; therefore, no hardware specifications are mentioned.
Software Dependencies	No	The paper is theoretical and does not describe any empirical experiments; therefore, no software dependencies with version numbers are mentioned.
Experiment Setup	No	The paper is theoretical and does not describe any empirical experiments; therefore, no experimental setup details such as hyperparameters or training configurations are provided.