Markov Chain Monte Carlo and Variational Inference: Bridging the Gap
Authors: Tim Salimans, Diederik Kingma, Max Welling
ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We describe the theoretical foundations that make this possible and show some promising first results. As a first example we look at sampling from the bivariate Gaussian distribution... To demonstrate our Hamiltonian variational approximation algorithm we use an example from (Albert, 2009)... Next, we demonstrate the effectiveness of our Hamiltonian variational inference approach for learning deep generative neural network models. These models are fitted to a binarized version of the MNIST dataset... See table 1 for our numerical results and a comparison to reported results with other methods. |
| Researcher Affiliation | Collaboration | Tim Salimans TIM@ALGORITMICA.NL Algoritmica Diederik P. Kingma and Max Welling [D.P.KINGMA,M.WELLING]@UVA.NL University of Amsterdam |
| Pseudocode | Yes | Algorithm 1 MCMC lower bound estimate; Algorithm 2 Markov Chain Variational Inference (MCVI); Algorithm 3 Hamiltonian variational inference (HVI); Algorithm 4 Sequential MCVI |
| Open Source Code | No | The paper does not provide any specific link or statement about open-sourcing the code for the described methodology. |
| Open Datasets | Yes | These models are fitted to a binarized version of the MNIST dataset as e.g. used in (Uria et al., 2014). |
| Dataset Splits | Yes | Before fitting our models to the full training set, the model hyper-parameters and number of training epochs were determined based on performance on a vali-dation set of about 15% of the available training data. |
| Hardware Specification | No | The paper mentions computational cost and the use of 'automatic differentiation packages' but does not specify any particular hardware components like CPU models, GPU models, or memory specifications used for the experiments. |
| Software Dependencies | No | The paper mentions Theano and Adam as software used: 'automatic differentiation package such as Theano (Bastien et al., 2012)' and 'Stochastic gradient-based optimization was performed using Adam (Kingma & Ba, 2014) with default hyperparameters.' However, no version numbers are provided for these software dependencies. |
| Experiment Setup | Yes | We choose q (z0), q (v01|z0), r (v1|z1) to all be multivariate Gaussian distributions with diagonal covariance matrix. The mass matrix M is also diagonal. The means of q (v01|z0) and r (v1|z1) are defined as linear functions in z and rz log p(x, z), with adjustable coefficients. The auxiliary inference model r(v|x, z) is chosen to be a fully-connected neural network with one deterministic hidden layer with nh = 300 hidden units with softplus (log(1 + exp(x))) activations and a Gaussian output variable with diagonal covariance. The number of leapfrog steps was varied from 0 to 16. After broader model search with a validation set, we trained a final model with 16 leapfrog steps and nh = 800. Stochastic gradient-based optimization was performed using Adam (Kingma & Ba, 2014) with default hyperparameters. |