Ladder Variational Autoencoders
Authors: Casper Kaae Sønderby, Tapani Raiko, Lars Maaløe, Søren Kaae Sønderby, Ole Winther
NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We propose a new inference model, the Ladder Variational Autoencoder, that recursively corrects the generative distribution by a data dependent approximate likelihood in a process resembling the recently proposed Ladder Network. We show that this model provides state of the art predictive log-likelihood and tighter log-likelihood lower bound compared to the purely bottom-up inference in layered Variational Autoencoders and other generative models. We provide a detailed analysis of the learned hierarchical latent representation and show that our new inference model is qualitatively different and utilizes a deeper more distributed hierarchy of latent variables. Finally, we observe that batch-normalization and deterministic warm-up (gradually turning on the KL-term) are crucial for training variational models with many stochastic layers. |
| Researcher Affiliation | Academia | Casper Kaae Sønderby casperkaae@gmail.com Tapani Raiko tapani.raiko@aalto.fi Lars Maaløe larsma@dtu.dk Søren Kaae Sønderby skaaesonderby@gmail.com Ole Winther , olwi@dtu.dk Bioinformatics Centre, Department of Biology, University of Copenhagen, Denmark Department of Computer Science, Aalto University, Finland Department of Applied Mathematics and Computer Science, Technical University of Denmark |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The source code is available at github6 6https://github.com/casperkaae/LVAE |
| Open Datasets | Yes | To test our models we use the standard benchmark datasets MNIST, OMNIGLOT [11] and NORB [12]. |
| Dataset Splits | No | The paper does not explicitly provide specific dataset split information (exact percentages, sample counts, or explicit methodology) for a validation set. It mentions training on the "complete training set" and evaluating on a "test set" but no separate validation split. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The models were implemented using the Theano [20], Lasagne [5] and Parmesan5 frameworks. (No specific version numbers are provided for these frameworks in the text.) |
| Experiment Setup | Yes | The largest models trained used a hierarchy of five layers of stochastic latent variables of sizes 64, 32, 16, 8 and 4, going from bottom to top. ... In all models the MLP s between x and z1 or d1 were of size 512. Subsequent layers were connected by MLP s of sizes 256, 128, 64 and 32 for all connections in both the VAE and LVAE. ... The models were trained end-to-end using the Adam [8] optimizer with a mini-batch size of 256. ... For MNIST, we used a sigmoid output layer to predict the mean of a Bernoulli observation model and leaky rectifiers (max(x, 0.1x)) as nonlinearities in the MLP s. The models were trained for 2000 epochs with a learning rate of 0.001 on the complete training set. Models using warm-up used Nt = 200. |