Neural Variational Inference and Learning in Undirected Graphical Models
Authors: Volodymyr Kuleshov, Stefano Ermon
NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically demonstrate the effectiveness of our method on several popular generative modeling datasets. We start with an experiment aimed at visualizing the importance of tracking the target distribution p using q during learning. We use Equation 6 to optimize the likelihood of a 5x5 Ising MRF with coupling factor J and unaries chosen randomly in {-10^-2, 10^-2}. Next, we use our method to train Restricted Boltzmann Machines (RBMs) on the UCI digits dataset [30]. Next, we use the variational objective (7) to learn two types of hybrid directed/undirected models: a variational autoencoder (VAE) and an auxiliary variable deep generative model (ADGM) [18]. |
| Researcher Affiliation | Academia | Volodymyr Kuleshov Stanford University Stanford, CA 94305 kuleshov@cs.stanford.edu Stefano Ermon Stanford University Stanford, CA 94305 ermon@cs.stanford.edu |
| Pseudocode | No | The paper describes the methods textually and mathematically, but it does not include any explicit pseudocode blocks or algorithms. |
| Open Source Code | No | The paper states, 'Both PCD and our method were implemented in Theano [32].' This refers to a third-party tool used, not the authors' own source code release. |
| Open Datasets | Yes | Next, we use our method to train Restricted Boltzmann Machines (RBMs) on the UCI digits dataset [30], which contains 10,992 8x8 images of handwritten digits; we augment this data by moving each image 1px to the left, right, up, and down. We show in Table 1 the test set negative log-likelihoods on the binarized MNIST [33] and 28x28 Omniglot [17] datasets. |
| Dataset Splits | No | The paper mentions the datasets used (UCI digits, MNIST, Omniglot) but does not provide specific details on how these datasets were split into training, validation, or test sets (e.g., percentages or exact counts) for reproduction. It refers to 'test set negative log likelihood' but doesn't specify the split. |
| Hardware Specification | No | The paper does not specify any hardware used for running the experiments (e.g., GPU or CPU models, memory, or cloud computing instances). |
| Software Dependencies | No | The paper mentions 'Both PCD and our method were implemented in Theano [32]' and 'ADAM [31]' but does not provide specific version numbers for these software components, which is required for reproducibility. |
| Experiment Setup | Yes | We train an RBM with 100 hidden units using ADAM [31] with batch size 100, a learning rate of 3x10^-4, β1 = 0.9, and β2 = 0.999; we choose q to be a uniform mixture of K = 10 Bernoulli distributions. We train all neural networks for 200 epochs with ADAM (same parameters as above) and neural variational inference (NVIL) with control variates as described in Mnih and Rezende [9]. |