Learning Deep Latent Gaussian Models with Markov Chain Monte Carlo
Authors: Matthew D. Hoffman
ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In section 5, we show that the MCMC approach eliminates overpruning and blurriness (two known issues with variational autoencoders), and achieves good held-out log-likelihood on the dynamically binarized permutationinvariant MNIST dataset. and 5. Experiments Below, we compare the results of our proposed method to a baseline mean-field VAE and to other published results on the binarized MNIST dataset (Le Cun et al., 1998). and Table 1. Reported held-out log-likelihoods on dynamically binarized permutation-invariant MNIST |
| Researcher Affiliation | Industry | 1Google, San Francisco, California, USA. Correspondence to: Matthew D. Hoffman <mhoffman@google.com>. |
| Pseudocode | Yes | Algorithm 1 Hamiltonian Monte Carlo and Algorithm 2 Hamiltonian Monte Carlo for DLGMs |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code or links to a code repository. |
| Open Datasets | Yes | For MNIST, we used 60,000 images for training and held out 10,000 for testing. The training images were re-binarized each epoch to prevent overfitting (as done by, e.g., Burda et al., 2016). and Le Cun, Yann, Cortes, Corinna, and Burges, Christopher JC. The mnist database of handwritten digits, 1998. |
| Dataset Splits | No | For MNIST, we used 60,000 images for training and held out 10,000 for testing. There is no explicit mention of a separate validation set split. |
| Hardware Specification | Yes | Training an MNIST model with M = 20 took about 8 hours on an NVIDIA K20 GPU, 21 times longer than with M = 11. |
| Software Dependencies | No | The paper mentions using 'computational graph languages such as Tensor Flow' but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | All models were optimized using Adam (Kingma & Ba, 2015) with default parameters and a minibatch size of 250. We trained all models for 500 epochs with a learning rate of 0.001, then for another 100 epochs with a learning rate of 0.0001. |