Improving Explorability in Variational Inference with Annealed Variational Objectives

Authors: Chin-Wei Huang, Shawn Tan, Alexandre Lacoste, Aaron C. Courville

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5 Experiments Our first experiment shows that having a unimodal approximate posterior is not universally benign.
Researcher Affiliation Collaboration MILA, University of Montreal ?Element AI k CIFAR Fellow
Pseudocode No The paper describes the proposed method using mathematical equations and prose but does not include any explicit pseudocode blocks or algorithm listings.
Open Source Code No The paper does not provide any explicit statements about releasing source code or links to a code repository for the described methodology.
Open Datasets Yes We train VAEs using a standard VAE (with a Gaussian approximate posterior), HVI, and HVI with AVO on the Binarized MNIST dataset from Larochelle and Murray (2011), and the Omniglot dataset as used in Burda et al. (2016).
Dataset Splits No The paper uses 'Ltr', 'Lva', and 'Lte' (training, validation, test likelihoods) in Table 1, implying the use of splits. However, it does not provide specific details on the split percentages, sample counts, or explicit methodology for generating these splits beyond referencing standard datasets.
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory configurations.
Software Dependencies No The paper mentions software components implicitly (e.g., 'MLP for both the encoder and decoder', 'gating activation as in Tomczak and Welling (2016)') but does not list specific software dependencies with version numbers required for reproduction.
Experiment Setup Yes We run 10 trials for each energy function, and do 2000 stochastic updates with a batch size of 64 and a learning rate of 0.001. We used hyperparameter search to determine batch size, learning rate, and the beta-annealing schedule. Both the encoder and decoder have 2 hidden layers, each with 300 hidden units. For MNIST, we used a dimension of 40 for the latent space, and 200 dimensions in Omniglot.