reproducibilityindex.ai

Discrete Variational Autoencoders

Authors: Jason Tyler Rolfe

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present a novel method to train a class of probabilistic models with discrete latent variables using the variational autoencoder framework, including backpropagation through the discrete latent variables. The associated class of probabilistic models comprises an undirected discrete component and a directed hierarchical continuous component. The discrete component captures the distribution over the disconnected smooth manifolds induced by the continuous component. As a result, this class of models efﬁciently learns both the class of objects in an image, and their speciﬁc realization in pixels, from unsupervised data; and outperforms state-of-the-art methods on the permutation-invariant MNIST, Omniglot, and Caltech-101 Silhouettes datasets.
Researcher Affiliation	Industry	Jason Tyler Rolfe D-Wave Systems Burnaby, BC V5G-4M9, Canada jrolfe@dwavesys.com
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide a direct link or explicit statement about the availability of the source code for the described methodology. Footnote 8 provides a link to a dataset used by a different paper.
Open Datasets	Yes	We train the resulting discrete VAEs on the permutation-invariant MNIST (Le Cun et al., 1998), Omniglot8 (Lake et al., 2013), and Caltech-101 Silhouettes datasets (Marlin et al., 2010). For MNIST, we use both the static binarization of Salakhutdinov & Murray (2008) and dynamic binarization. ... We use the partitioned, preprocessed Omniglot dataset of Burda et al. (2016), available from https://github.com/yburda/iwae/tree/master/datasets/OMNIGLOT.
Dataset Splits	No	The paper mentions training and testing on datasets but does not explicitly provide specific details about train/validation/test splits (e.g., percentages or sample counts) needed for reproduction.
Hardware Specification	No	The paper mentions 'a custom GPU acceleration library used for an earlier version of the code' in the acknowledgements, but it does not specify the exact GPU model or any other hardware components (CPU, RAM, etc.) used for the experiments described in the paper.
Software Dependencies	No	The paper mentions using 'ADAM' for optimization and 'TensorFlow' but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup	Yes	All hyperparameters were tuned via manual experimentation. Except in Figure 6, RBMs have 128 units (64 units per side, with full bipartite connections between the two sides), with 4 layers of hierarchy in the approximating posterior. We use 100 iterations of block Gibbs sampling, with 20 persistent chains per element of the minibatch, to sample from the prior in the stochastic approximation to Equation 11. ... All neural networks implementing components of the approximating posterior contain two hidden layers of 2000 units. ... We use warm-up with strength 20 for 5 epochs, and additional warm-up of strength 2 on the RBM alone for 20 epochs.