Discrete Variational Autoencoders
Authors: Jason Tyler Rolfe
ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present a novel method to train a class of probabilistic models with discrete latent variables using the variational autoencoder framework, including backpropagation through the discrete latent variables. The associated class of probabilistic models comprises an undirected discrete component and a directed hierarchical continuous component. The discrete component captures the distribution over the disconnected smooth manifolds induced by the continuous component. As a result, this class of models efficiently learns both the class of objects in an image, and their specific realization in pixels, from unsupervised data; and outperforms state-of-the-art methods on the permutation-invariant MNIST, Omniglot, and Caltech-101 Silhouettes datasets. |
| Researcher Affiliation | Industry | Jason Tyler Rolfe D-Wave Systems Burnaby, BC V5G-4M9, Canada jrolfe@dwavesys.com |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide a direct link or explicit statement about the availability of the source code for the described methodology. Footnote 8 provides a link to a *dataset* used by a different paper. |
| Open Datasets | Yes | We train the resulting discrete VAEs on the permutation-invariant MNIST (Le Cun et al., 1998), Omniglot8 (Lake et al., 2013), and Caltech-101 Silhouettes datasets (Marlin et al., 2010). For MNIST, we use both the static binarization of Salakhutdinov & Murray (2008) and dynamic binarization. ... We use the partitioned, preprocessed Omniglot dataset of Burda et al. (2016), available from https://github.com/yburda/iwae/tree/master/datasets/OMNIGLOT. |
| Dataset Splits | No | The paper mentions training and testing on datasets but does not explicitly provide specific details about train/validation/test splits (e.g., percentages or sample counts) needed for reproduction. |
| Hardware Specification | No | The paper mentions 'a custom GPU acceleration library used for an earlier version of the code' in the acknowledgements, but it does not specify the exact GPU model or any other hardware components (CPU, RAM, etc.) used for the experiments described in the paper. |
| Software Dependencies | No | The paper mentions using 'ADAM' for optimization and 'TensorFlow' but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | All hyperparameters were tuned via manual experimentation. Except in Figure 6, RBMs have 128 units (64 units per side, with full bipartite connections between the two sides), with 4 layers of hierarchy in the approximating posterior. We use 100 iterations of block Gibbs sampling, with 20 persistent chains per element of the minibatch, to sample from the prior in the stochastic approximation to Equation 11. ... All neural networks implementing components of the approximating posterior contain two hidden layers of 2000 units. ... We use warm-up with strength 20 for 5 epochs, and additional warm-up of strength 2 on the RBM alone for 20 epochs. |