Neural Variational Inference and Learning in Undirected Graphical Models

Authors: Volodymyr Kuleshov, Stefano Ermon

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically demonstrate the effectiveness of our method on several popular generative modeling datasets. We start with an experiment aimed at visualizing the importance of tracking the target distribution p using q during learning. We use Equation 6 to optimize the likelihood of a 5x5 Ising MRF with coupling factor J and unaries chosen randomly in {-10^-2, 10^-2}. Next, we use our method to train Restricted Boltzmann Machines (RBMs) on the UCI digits dataset [30]. Next, we use the variational objective (7) to learn two types of hybrid directed/undirected models: a variational autoencoder (VAE) and an auxiliary variable deep generative model (ADGM) [18].
Researcher Affiliation Academia Volodymyr Kuleshov Stanford University Stanford, CA 94305 kuleshov@cs.stanford.edu Stefano Ermon Stanford University Stanford, CA 94305 ermon@cs.stanford.edu
Pseudocode No The paper describes the methods textually and mathematically, but it does not include any explicit pseudocode blocks or algorithms.
Open Source Code No The paper states, 'Both PCD and our method were implemented in Theano [32].' This refers to a third-party tool used, not the authors' own source code release.
Open Datasets Yes Next, we use our method to train Restricted Boltzmann Machines (RBMs) on the UCI digits dataset [30], which contains 10,992 8x8 images of handwritten digits; we augment this data by moving each image 1px to the left, right, up, and down. We show in Table 1 the test set negative log-likelihoods on the binarized MNIST [33] and 28x28 Omniglot [17] datasets.
Dataset Splits No The paper mentions the datasets used (UCI digits, MNIST, Omniglot) but does not provide specific details on how these datasets were split into training, validation, or test sets (e.g., percentages or exact counts) for reproduction. It refers to 'test set negative log likelihood' but doesn't specify the split.
Hardware Specification No The paper does not specify any hardware used for running the experiments (e.g., GPU or CPU models, memory, or cloud computing instances).
Software Dependencies No The paper mentions 'Both PCD and our method were implemented in Theano [32]' and 'ADAM [31]' but does not provide specific version numbers for these software components, which is required for reproducibility.
Experiment Setup Yes We train an RBM with 100 hidden units using ADAM [31] with batch size 100, a learning rate of 3x10^-4, β1 = 0.9, and β2 = 0.999; we choose q to be a uniform mixture of K = 10 Bernoulli distributions. We train all neural networks for 200 epochs with ADAM (same parameters as above) and neural variational inference (NVIL) with control variates as described in Mnih and Rezende [9].