Learning Deep Latent Gaussian Models with Markov Chain Monte Carlo

Authors: Matthew D. Hoffman

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In section 5, we show that the MCMC approach eliminates overpruning and blurriness (two known issues with variational autoencoders), and achieves good held-out log-likelihood on the dynamically binarized permutationinvariant MNIST dataset. and 5. Experiments Below, we compare the results of our proposed method to a baseline mean-field VAE and to other published results on the binarized MNIST dataset (Le Cun et al., 1998). and Table 1. Reported held-out log-likelihoods on dynamically binarized permutation-invariant MNIST
Researcher Affiliation Industry 1Google, San Francisco, California, USA. Correspondence to: Matthew D. Hoffman <mhoffman@google.com>.
Pseudocode Yes Algorithm 1 Hamiltonian Monte Carlo and Algorithm 2 Hamiltonian Monte Carlo for DLGMs
Open Source Code No The paper does not provide any explicit statements about releasing source code or links to a code repository.
Open Datasets Yes For MNIST, we used 60,000 images for training and held out 10,000 for testing. The training images were re-binarized each epoch to prevent overfitting (as done by, e.g., Burda et al., 2016). and Le Cun, Yann, Cortes, Corinna, and Burges, Christopher JC. The mnist database of handwritten digits, 1998.
Dataset Splits No For MNIST, we used 60,000 images for training and held out 10,000 for testing. There is no explicit mention of a separate validation set split.
Hardware Specification Yes Training an MNIST model with M = 20 took about 8 hours on an NVIDIA K20 GPU, 21 times longer than with M = 11.
Software Dependencies No The paper mentions using 'computational graph languages such as Tensor Flow' but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes All models were optimized using Adam (Kingma & Ba, 2015) with default parameters and a minibatch size of 250. We trained all models for 500 epochs with a learning rate of 0.001, then for another 100 epochs with a learning rate of 0.0001.