Ladder Variational Autoencoders

Authors: Casper Kaae Sønderby, Tapani Raiko, Lars Maaløe, Søren Kaae Sønderby, Ole Winther

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We propose a new inference model, the Ladder Variational Autoencoder, that recursively corrects the generative distribution by a data dependent approximate likelihood in a process resembling the recently proposed Ladder Network. We show that this model provides state of the art predictive log-likelihood and tighter log-likelihood lower bound compared to the purely bottom-up inference in layered Variational Autoencoders and other generative models. We provide a detailed analysis of the learned hierarchical latent representation and show that our new inference model is qualitatively different and utilizes a deeper more distributed hierarchy of latent variables. Finally, we observe that batch-normalization and deterministic warm-up (gradually turning on the KL-term) are crucial for training variational models with many stochastic layers.
Researcher Affiliation Academia Casper Kaae Sønderby casperkaae@gmail.com Tapani Raiko tapani.raiko@aalto.fi Lars Maaløe larsma@dtu.dk Søren Kaae Sønderby skaaesonderby@gmail.com Ole Winther , olwi@dtu.dk Bioinformatics Centre, Department of Biology, University of Copenhagen, Denmark Department of Computer Science, Aalto University, Finland Department of Applied Mathematics and Computer Science, Technical University of Denmark
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes The source code is available at github6 6https://github.com/casperkaae/LVAE
Open Datasets Yes To test our models we use the standard benchmark datasets MNIST, OMNIGLOT [11] and NORB [12].
Dataset Splits No The paper does not explicitly provide specific dataset split information (exact percentages, sample counts, or explicit methodology) for a validation set. It mentions training on the "complete training set" and evaluating on a "test set" but no separate validation split.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The models were implemented using the Theano [20], Lasagne [5] and Parmesan5 frameworks. (No specific version numbers are provided for these frameworks in the text.)
Experiment Setup Yes The largest models trained used a hierarchy of five layers of stochastic latent variables of sizes 64, 32, 16, 8 and 4, going from bottom to top. ... In all models the MLP s between x and z1 or d1 were of size 512. Subsequent layers were connected by MLP s of sizes 256, 128, 64 and 32 for all connections in both the VAE and LVAE. ... The models were trained end-to-end using the Adam [8] optimizer with a mini-batch size of 256. ... For MNIST, we used a sigmoid output layer to predict the mean of a Bernoulli observation model and leaky rectifiers (max(x, 0.1x)) as nonlinearities in the MLP s. The models were trained for 2000 epochs with a learning rate of 0.001 on the complete training set. Models using warm-up used Nt = 200.