Black-Box Variational Inference as a Parametric Approximation to Langevin Dynamics
Authors: Matthew Hoffman, Yian Ma
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section we empirically evaluate various flavors of BBVI and gradient-based MCMC to see how well the theoretical results of sections 2 and 4.3 agree with practice. |
| Researcher Affiliation | Collaboration | Matthew Hoffman 1 Yi-An Ma 1 2 1Google Research 2Halıcıoğlu Data Science Institute, University of California, San Diego, USA. |
| Pseudocode | No | The paper describes algorithms and their updates using mathematical equations (e.g., equation 3 for LD) but does not provide structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states "All experiments were done using TensorFlow Probability (Dillon et al., 2017; Lao et al., 2020)." but does not explicitly provide a link to or state the availability of the source code for the methodology or experiments described in this paper. |
| Open Datasets | Yes | Item-Response-Theory Model: This is the posterior of a one-parameter-logistic item-response-theory (IRT) model from the Stan (Carpenter et al., 2017) examples repository4 https://github.com/stan-dev/ example-models/blob/master/misc/irt/irt. stan and Sparse Logistic Regression: This is the logistic regression model with soft-sparsity priors considered by Hoffman et al. (2019) applied to the German credit dataset (Dua & Graff, 2019): |
| Dataset Splits | No | The paper describes the models and datasets used (e.g., Item-Response Theory, German credit dataset) but does not provide specific train/validation/test splits, percentages, or explicit methodologies for partitioning the data into these sets. |
| Hardware Specification | Yes | Note that the BBVI scheme computes a minibatch of 100 gradients of the target density per step (which reduces the variance of its gradient estimates, and thereby lets it take larger steps), so the comparison is fair the wallclock time per gradient evaluation of the BBVI algorithm and the 100-chain MCMC algorithms is nearly identical on a Pascal Titan X GPU. |
| Software Dependencies | Yes | All experiments were done using TensorFlow Probability (Dillon et al., 2017; Lao et al., 2020). |
| Experiment Setup | Yes | BBVI gradients were estimated using a minibatch of 100 samples from q. Each algorithm used a manually tuned constant step size. We evaluate BBVI with vanilla SGD and with momentum 0.9, Metropolis-adjusted Langevin, and Hamiltonian Monte Carlo with 10 leapfrog steps. For BBVI, we used a diagonal-covariance Gaussian variational family parameterized by the flow θd = µd +0.1 log(1+e10σd)ϵd and To estimate the ground-truth means and standard deviations for each model, we ran 500 HMC chains of 1000 iterations each, discarding the first 500 samples of each chain. |