Stochastic Approximate Gradient Descent via the Langevin Algorithm
Authors: Yixuan Qiu, Xiao Wang5428-5435
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We introduce a novel and efficient algorithm called the stochastic approximate gradient descent (SAGD), as an alternative to the stochastic gradient descent for cases where unbiased stochastic gradients cannot be trivially obtained. Traditional methods for such problems rely on general-purpose sampling techniques such as Markov chain Monte Carlo, which typically requires manual intervention for tuning parameters and does not work efficiently in practice. Instead, SAGD makes use of the Langevin algorithm to construct stochastic gradients that are biased in finite steps but accurate asymptotically, enabling us to theoretically establish the convergence guarantee for SAGD. Inspired by our theoretical analysis, we also provide useful guidelines for its practical implementation. Finally, we show that SAGD performs well experimentally in popular statistical and machine learning problems such as the expectation-maximization algorithm and the variational autoencoders. |
| Researcher Affiliation | Academia | 1Department of Statistics and Data Science, Carnegie Mellon University, yixuanq@andrew.cmu.edu 2Department of Statistics, Purdue University University, wangxiao@purdue.edu |
| Pseudocode | Yes | Algorithm 1: Stochastic approximate gradient descent for minimizing F(θ) = E[f(θ; ξ)] |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | Yes | In the last experiment, we consider the MNIST handwritten digits data set, and fit generative models on it. |
| Dataset Splits | No | The paper uses synthetic data and MNIST but does not explicitly provide specific dataset split information (e.g., exact percentages or sample counts for training, validation, and testing). |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers. |
| Experiment Setup | Yes | We set the initial value to be θ0 = (0, 1), and run both SAGD and exact GD for T = 100 iterations in each M-step, with a constant step size αt = 0.2. For SAGD, Langevin parameters are specified as δt = 0.1/√t and Kt = t + 20, with the first 100 Langevin iterations discarded as burn-in, similar to that in MCMC. ... we first train a VAE model with 5000 iterations, and then fine-tune the neural network parameter θ by running the following four training algorithms for additional 1000 iterations: ... HMC we use the same step size and chain length as SAGD, and run L = 5 leapfrog steps to get each proposal. ... We first train a VAE model for 500 epochs with a batch size of 200, and then run SAGD for 100 epochs for fine-tuning. In SAGD, twenty independent chains are used to compute the approximate gradient, each with five burn-in s. |