Markov Chain Monte Carlo and Variational Inference: Bridging the Gap

Authors: Tim Salimans, Diederik Kingma, Max Welling

ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We describe the theoretical foundations that make this possible and show some promising first results. As a first example we look at sampling from the bivariate Gaussian distribution... To demonstrate our Hamiltonian variational approximation algorithm we use an example from (Albert, 2009)... Next, we demonstrate the effectiveness of our Hamiltonian variational inference approach for learning deep generative neural network models. These models are fitted to a binarized version of the MNIST dataset... See table 1 for our numerical results and a comparison to reported results with other methods.
Researcher Affiliation Collaboration Tim Salimans TIM@ALGORITMICA.NL Algoritmica Diederik P. Kingma and Max Welling [D.P.KINGMA,M.WELLING]@UVA.NL University of Amsterdam
Pseudocode Yes Algorithm 1 MCMC lower bound estimate; Algorithm 2 Markov Chain Variational Inference (MCVI); Algorithm 3 Hamiltonian variational inference (HVI); Algorithm 4 Sequential MCVI
Open Source Code No The paper does not provide any specific link or statement about open-sourcing the code for the described methodology.
Open Datasets Yes These models are fitted to a binarized version of the MNIST dataset as e.g. used in (Uria et al., 2014).
Dataset Splits Yes Before fitting our models to the full training set, the model hyper-parameters and number of training epochs were determined based on performance on a vali-dation set of about 15% of the available training data.
Hardware Specification No The paper mentions computational cost and the use of 'automatic differentiation packages' but does not specify any particular hardware components like CPU models, GPU models, or memory specifications used for the experiments.
Software Dependencies No The paper mentions Theano and Adam as software used: 'automatic differentiation package such as Theano (Bastien et al., 2012)' and 'Stochastic gradient-based optimization was performed using Adam (Kingma & Ba, 2014) with default hyperparameters.' However, no version numbers are provided for these software dependencies.
Experiment Setup Yes We choose q (z0), q (v01|z0), r (v1|z1) to all be multivariate Gaussian distributions with diagonal covariance matrix. The mass matrix M is also diagonal. The means of q (v01|z0) and r (v1|z1) are defined as linear functions in z and rz log p(x, z), with adjustable coefficients. The auxiliary inference model r(v|x, z) is chosen to be a fully-connected neural network with one deterministic hidden layer with nh = 300 hidden units with softplus (log(1 + exp(x))) activations and a Gaussian output variable with diagonal covariance. The number of leapfrog steps was varied from 0 to 16. After broader model search with a validation set, we trained a final model with 16 leapfrog steps and nh = 800. Stochastic gradient-based optimization was performed using Adam (Kingma & Ba, 2014) with default hyperparameters.