reproducibilityindex.ai

Stochastic Approximate Gradient Descent via the Langevin Algorithm

Authors: Yixuan Qiu, Xiao Wang5428-5435

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We introduce a novel and efﬁcient algorithm called the stochastic approximate gradient descent (SAGD), as an alternative to the stochastic gradient descent for cases where unbiased stochastic gradients cannot be trivially obtained. Traditional methods for such problems rely on general-purpose sampling techniques such as Markov chain Monte Carlo, which typically requires manual intervention for tuning parameters and does not work efﬁciently in practice. Instead, SAGD makes use of the Langevin algorithm to construct stochastic gradients that are biased in ﬁnite steps but accurate asymptotically, enabling us to theoretically establish the convergence guarantee for SAGD. Inspired by our theoretical analysis, we also provide useful guidelines for its practical implementation. Finally, we show that SAGD performs well experimentally in popular statistical and machine learning problems such as the expectation-maximization algorithm and the variational autoencoders.
Researcher Affiliation	Academia	1Department of Statistics and Data Science, Carnegie Mellon University, yixuanq@andrew.cmu.edu 2Department of Statistics, Purdue University University, wangxiao@purdue.edu
Pseudocode	Yes	Algorithm 1: Stochastic approximate gradient descent for minimizing F(θ) = E[f(θ; ξ)]
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described.
Open Datasets	Yes	In the last experiment, we consider the MNIST handwritten digits data set, and ﬁt generative models on it.
Dataset Splits	No	The paper uses synthetic data and MNIST but does not explicitly provide specific dataset split information (e.g., exact percentages or sample counts for training, validation, and testing).
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details with version numbers.
Experiment Setup	Yes	We set the initial value to be θ0 = (0, 1), and run both SAGD and exact GD for T = 100 iterations in each M-step, with a constant step size αt = 0.2. For SAGD, Langevin parameters are speciﬁed as δt = 0.1/√t and Kt = t + 20, with the ﬁrst 100 Langevin iterations discarded as burn-in, similar to that in MCMC. ... we ﬁrst train a VAE model with 5000 iterations, and then ﬁne-tune the neural network parameter θ by running the following four training algorithms for additional 1000 iterations: ... HMC we use the same step size and chain length as SAGD, and run L = 5 leapfrog steps to get each proposal. ... We ﬁrst train a VAE model for 500 epochs with a batch size of 200, and then run SAGD for 100 epochs for ﬁne-tuning. In SAGD, twenty independent chains are used to compute the approximate gradient, each with ﬁve burn-in s.