Hamiltonian Variational Auto-Encoder

Authors: Anthony L. Caterini, Arnaud Doucet, Dino Sejdinovic

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we discuss the experiments used to validate our method. We first test HVAE on an example with a tractable full log likelihood (where no neural networks are needed), and then perform larger-scale tests on the MNIST dataset.
Researcher Affiliation Academia Anthony L. Caterini1, Arnaud Doucet1,2, Dino Sejdinovic1,2 1Department of Statistics, University of Oxford 2Alan Turing Institute for Data Science
Pseudocode Yes Algorithm 1 Hamiltonian ELBO, Fixed Tempering
Open Source Code Yes Code is available online.4
Open Datasets Yes The next experiment that we consider is using HVAE to improve upon a convolutional variational auto-encoder (VAE) for the binarized MNIST handwritten digit dataset. (...) We use the standard stochastic binarization of MNIST [24] as training data
Dataset Splits Yes We also employ early stopping by halting the training procedure if there is no improvement in the loss on validation data over 100 epochs.
Hardware Specification No The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running the experiments. It only mentions that models were trained using TensorFlow.
Software Dependencies No The paper states 'All models were trained using Tensor Flow [1]' but does not provide a specific version number for TensorFlow or any other software dependencies with version details.
Experiment Setup Yes All experiments have N = 10,000 and all training was done using RMSProp [27] with a learning rate of 10 3. (...) We train using Adamax [14] with learning rate 10 3. We also employ early stopping by halting the training procedure if there is no improvement in the loss on validation data over 100 epochs. (...) The inference network consists of three convolutional layers, each with filters of size 5 5 and a stride of 2. The convolutional layers output 16, 32, and 32 feature maps, respectively. The output of the third layer is fed into a fully-connected layer with hidden dimension nh = 450, whose output is then fully connected to the output means and standard deviations each of size . Softplus activation functions are used throughout the network except immediately before the outputted mean.