reproducibilityindex.ai

Hierarchical Decompositional Mixtures of Variational Autoencoders

Authors: Ping Liang Tan, Robert Peharz

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In experiments we show that our models outperform classical VAEs on almost all of our experimental benchmarks. Moreover, we show that our model is highly data efﬁcient and degrades very gracefully in extremely low data regimes. We compare SPVAEs with classical VAEs on the MNIST, CIFAR-10, and SVHN image datasets. For each dataset, we consider two versions: i) interpreting images as continuous signals and using Gaussians for the VAE outputs, and ii) interpreting images as discrete data on 0 . . . 255 and using Binomial distributions as VAE outputs. On all of these 6 benchmarks, SPVAEs clearly outperform classical VAEs in terms of test likelihood (estimated with 5000 importance-weighted samples). At the same time, due to their decompositional nature, SPVAE models are almost an order of magnitude smaller than VAEs. Moreover, we show that SPVAEs are more data efﬁcient than VAEs: on all benchmarks we can reduce the amount of training data down to 10%, without signiﬁcantly deteriorating the test performance. Even for extremely low data regimes, SPVAEs degrade much more gracefully than VAEs. Table 1. Performance on test set, 5000-sample IWAE ELBO. Figure 2. Degradation of test ELBO as training set size is reduced.
Researcher Affiliation	Collaboration	1Department of Engineering, University of Cambridge, UK 2DSO National Laboratories, Singapore. Correspondence to: Ping Liang Tan <plt28j@gmail.com>, Robert Peharz <rp587@cam.ac.uk>.
Pseudocode	Yes	Algorithm 1 Stochastic Variational EM for SPVAE
Open Source Code	Yes	Code available under https://github.com/ cambridge-mlg/SPVAE.
Open Datasets	Yes	We compare SPVAEs with classical VAEs on the MNIST (Le Cun et al.), CIFAR-10 (Krizhevsky, 2009), and SVHN (Netzer et al., 2011) image datasets.
Dataset Splits	Yes	For MNIST, we randomly selected 10k images from the training set for the validation set; for CIFAR-10, the 60k images were randomly divided into sets of sizes 40k / 10k / 10k, which were used as training, validation, and test sets. For SVHN, we used the ﬁrst 26032 images from the extra set as validation set (i.e. of the same size as the test set).
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. It only mentions general computing environments like "Tensorﬂow".
Software Dependencies	No	We implemented all models in Tensorﬂow (Abadi M. et al., 2015) and used Adam (Kingma & Ba, 2015) with its default parameters for optimizing the respective ELBOs. The paper mentions TensorFlow and Adam optimizer but does not provide specific version numbers for these or any other software components.
Experiment Setup	Yes	During training, we consistently used 5 importance weighted samples for ELBO estimates (2). We used a batch size of 128 throughout all our experiments. The quality of density estimation in VAEs and SPVAEs depends both on the model size and the dimensionality of the latent codes. Thus, we treated the number of hidden units H per neural network layer and the dimensionality nz of latent VAE codes as hyper-parameters, and cross-validated them on a validation set. For MNIST, we randomly selected 10k images from the training set for the validation set; for CIFAR-10, the 60k images were randomly divided into sets of sizes 40k / 10k / 10k, which were used as training, validation, and test sets. For SVHN, we used the ﬁrst 26032 images from the extra set as validation set (i.e. of the same size as the test set). The same H was used for each layer (decoder and encoder), and in the case of SPVAEs, for each VAE leaf. In order to keep the sizes of the overall models comparable, we used ranges nz {1, 2, 5, 25, 50, 100, 150, 200}, H {30, 100, 300, 600} for VAEs, and nz {1, 2, 5, 25, 50}, H {8, 16, 32} for SPVAEs. No regularization was applied, but we used early stopping in order to prevent overﬁtting. In particular, we evaluated the training progress on the validation set every 128 batches and stop training if the performance on the validation set decreased ﬁve times consecutively.