Global Convergence of Langevin Dynamics Based Algorithms for Nonconvex Optimization
Authors: Pan Xu, Jinghui Chen, Difan Zou, Quanquan Gu
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We present a unified framework to analyze the global convergence of Langevin dynamics based algorithms for nonconvex finite-sum optimization with n component functions. At the core of our analysis is a direct analysis of the ergodicity of the numerical approximations to Langevin dynamics, which leads to faster convergence rates. Specifically, we show that gradient Langevin dynamics (GLD) and stochastic gradient Langevin dynamics (SGLD) converge to the almost minimizer2 within e O stochastic gradient evaluations respectively3, where d is the problem dimension, and λ is the spectral gap of the Markov chain generated by GLD. Both results improve upon the best known gradient complexity4 results [45]. Furthermore, for the first time we prove the global convergence guarantee for variance reduced stochastic gradient Langevin dynamics (SVRG-LD) to the almost minimizer within e O "pnd5/(λ4 5/2) stochastic gradient evaluations, which outperforms the gradient complexities of GLD and SGLD in a wide regime. Our theoretical analyses shed some light on using Langevin dynamics based algorithms for nonconvex optimization with provable guarantees. |
| Researcher Affiliation | Academia | Pan Xu Department of Computer Science UCLA Los Angeles, CA 90095 panxu@cs.ucla.edu Jinghui Chen Department of Computer Science University of Virginia Charlottesville, VA 22903 jc4zg@virginia.edu Difan Zou Department of Computer Science UCLA Los Angeles, CA 90095 knowzou@cs.ucla.edu Quanquan Gu Department of Computer Science UCLA Los Angeles, CA 90095 qgu@cs.ucla.edu |
| Pseudocode | Yes | Algorithm 1 Gradient Langevin Dynamics (GLD) input: step size > 0; inverse temperature parameter β > 0; X0 = 0 for k = 0, 1, . . . , K 1 do randomly draw k N(0, Id d) Xk+1 = Xk r Fn(Xk) + 2 /β k end for Algorithm 2 Stochastic Gradient Langevin Dynamics (SGLD) input: step size > 0; batch size B; inverse temperature parameter β > 0; Y0 = 0 for k = 0, 1, . . . , K 1 do randomly pick a subset Ik from {1, . . . , n} of size |Ik| = B; randomly draw k N(0, Id d) Yk+1 = Yk /B P i2Ik rfi(Yk) + 2 /β k end for Algorithm 3 Stochastic Variance Reduced Gradient Langevin Dynamics (SVRG-LD) input: step size > 0; batch size B; epoch length L; inverse temperature parameter β > 0 initialization: Z0 = 0, e Z(0) = Z0 for s = 0, 1, . . . , (K/L) 1 do f W = r Fn( e Z(s)) for = 0, . . . , L 1 do k = s L + randomly pick a subset Ik from {1, . . . , n} of size |Ik| = B; draw k N(0, Id d) erk = 1/B P rfik(Zk) rfik( e Z(s)) + f W Zk+1 = Zk erk + 2 /β k end for e Z(s) = Z(s+1)L end for |
| Open Source Code | No | The paper does not mention or provide any links to open-source code for the described methodology. |
| Open Datasets | No | The paper is theoretical and focuses on mathematical analysis of algorithms; it does not involve empirical training on datasets. Therefore, no information about publicly available datasets for training is provided. |
| Dataset Splits | No | The paper is theoretical and does not involve empirical experiments with datasets; therefore, no information about training/test/validation splits is provided. |
| Hardware Specification | No | The paper is theoretical and does not describe any empirical experiments; therefore, no hardware specifications are mentioned. |
| Software Dependencies | No | The paper is theoretical and does not describe any empirical experiments; therefore, no software dependencies with version numbers are mentioned. |
| Experiment Setup | No | The paper is theoretical and does not describe any empirical experiments; therefore, no experimental setup details such as hyperparameters or training configurations are provided. |