Semi-Amortized Variational Autoencoders

Authors: Yoon Kim, Sam Wiseman, Andrew Miller, David Sontag, Alexander Rush

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show this approach outperforms strong autoregressive and variational baselines on standard text and image datasets. We apply our approach to train deep generative models of text and images, and observe that they outperform autoregressive/VAE/SVI baselines, in addition to direct baselines that combine VAE with SVI but do not perform end-to-end training.
Researcher Affiliation Academia 1School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, USA 2CSAIL & IMES, Massachusetts Institute of Technology, Cambridge, MA, USA.
Pseudocode Yes Algorithm 1 Semi-Amortized Variational Autoencoders
Open Source Code Yes Code is available at https://github.com/harvardnlp/sa-vae.
Open Datasets Yes Text modeling on the Yahoo questions corpus from Yang et al. (2017). We next apply our approach to model images on the OMNIGLOT dataset (Lake et al., 2015).
Dataset Splits No The paper mentions a 'Training set consists of 5000 points' for synthetic data and refers to 'Appendix B' for 'Full details regarding hyperparameters/model architectures for all experiments', but it does not explicitly provide specific train/validation/test dataset split percentages, absolute sample counts for each split, or direct citations for predefined splits in the main text.
Hardware Specification No The paper does not provide specific details about the hardware used for experiments, such as exact GPU or CPU models, memory, or cloud instance types.
Software Dependencies No The paper does not provide specific version numbers for software dependencies or libraries used in the experiments (e.g., Python version, PyTorch/TensorFlow version).
Experiment Setup Yes For all experiments we utilize stochastic gradient descent with momentum on the negative ELBO. Our prior is the spherical Gaussian N(0, I) and the variational posterior is diagonal Gaussian, where the variational parameters are given by the mean vector and the diagonal log variance vector, i.e. λ = [µ, log σ2]. Our architecture and hyperparameters are identical to the LSTM-VAE baselines considered in Yang et al. (2017), except that we train with SGD instead of Adam, which was found to perform better for training LSTMs. Specifically, both the inference network and the generative model are one-layer LSTMs with 1024 hidden units and 512-dimensional word embeddings. The latent variable is 32-dimensional. for all the variational models we utilize a KL-cost annealing strategy whereby the multiplier on the KL term is increased linearly from 0.1 to 1.0 each batch over 10 epochs. Full details regarding hyperparameters/model architectures for all experiments are in Appendix B.