A Contrastive Divergence for Combining Variational Inference and MCMC
Authors: Francisco Ruiz, Michalis Titsias
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show experimentally that optimizing the VCD leads to better predictive performance on two latent variable models: logistic matrix factorization and variational autoencoders (VAEs). Here we demonstrate the algorithm described in Section 2.5, which minimizes the variational contrastive divergence (VCD) with respect to the variational parameters θ. |
| Researcher Affiliation | Collaboration | 1University of Cambridge, Cambridge, UK 2Columbia University, New York, USA 3Deep Mind, London, UK. |
| Pseudocode | Yes | Algorithm 1 Minimization of the VCD |
| Open Source Code | Yes | Code is available online at https://github.com/ franrruiz/vcd_divergence. |
| Open Datasets | Yes | We use two datasets. The first one is the binarized MNIST data (Salakhutdinov & Murray, 2008), which contains 50,000 training images and 10,000 test images of hand-written digits. The second dataset is Fashion-MNIST (Xiao et al., 2017), which contains 60,000 training images and 10,000 test images of clothing items. |
| Dataset Splits | No | The paper specifies training and test set sizes (e.g., '50,000 training images and 10,000 test images' for MNIST) but does not explicitly state a separate validation set split or how it was used. |
| Hardware Specification | No | The paper states 'No parallelism or GPU acceleration was used.' but does not provide specific details on the CPU, memory, or other hardware components used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software names with version numbers for dependencies such as programming languages, libraries, or frameworks used in the experiments. |
| Experiment Setup | Yes | We set the number of HMC iterations t = 8, using 5 leapfrog steps. We set the learning rate η = 5 10 4 for the variational parameters corresponding to the mean, η = 2.5 10 4 for the variational parameters corresponding to the covariance, and η = 5 10 4 for the model parameters φ. We additionally decrease the learning rate by a factor of 0.9 every 15,000 iterations. We run 400,000 iterations of each optimization algorithm. We perform stochastic VI by subsampling a minibatch of observations at each iteration (Hoffman et al., 2013); we set the minibatch size to 100. |