Hamiltonian Variational Auto-Encoder
Authors: Anthony L. Caterini, Arnaud Doucet, Dino Sejdinovic
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we discuss the experiments used to validate our method. We first test HVAE on an example with a tractable full log likelihood (where no neural networks are needed), and then perform larger-scale tests on the MNIST dataset. |
| Researcher Affiliation | Academia | Anthony L. Caterini1, Arnaud Doucet1,2, Dino Sejdinovic1,2 1Department of Statistics, University of Oxford 2Alan Turing Institute for Data Science |
| Pseudocode | Yes | Algorithm 1 Hamiltonian ELBO, Fixed Tempering |
| Open Source Code | Yes | Code is available online.4 |
| Open Datasets | Yes | The next experiment that we consider is using HVAE to improve upon a convolutional variational auto-encoder (VAE) for the binarized MNIST handwritten digit dataset. (...) We use the standard stochastic binarization of MNIST [24] as training data |
| Dataset Splits | Yes | We also employ early stopping by halting the training procedure if there is no improvement in the loss on validation data over 100 epochs. |
| Hardware Specification | No | The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running the experiments. It only mentions that models were trained using TensorFlow. |
| Software Dependencies | No | The paper states 'All models were trained using Tensor Flow [1]' but does not provide a specific version number for TensorFlow or any other software dependencies with version details. |
| Experiment Setup | Yes | All experiments have N = 10,000 and all training was done using RMSProp [27] with a learning rate of 10 3. (...) We train using Adamax [14] with learning rate 10 3. We also employ early stopping by halting the training procedure if there is no improvement in the loss on validation data over 100 epochs. (...) The inference network consists of three convolutional layers, each with filters of size 5 5 and a stride of 2. The convolutional layers output 16, 32, and 32 feature maps, respectively. The output of the third layer is fed into a fully-connected layer with hidden dimension nh = 450, whose output is then fully connected to the output means and standard deviations each of size . Softplus activation functions are used throughout the network except immediately before the outputted mean. |