Importance Weighted Autoencoders

Authors: Yuri Burda, Ruslan Salakhutdinov, Roger Grosse

ICLR 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show empirically that IWAEs learn richer latent space representations than VAEs, leading to improved test log-likelihood on density estimation benchmarks. 5 EXPERIMENTAL RESULTS We have compared the generative performance of the VAE and IWAE in terms of their held-out log-likelihoods on two density estimation benchmark datasets.
Researcher Affiliation Academia Yuri Burda, Roger Grosse & Ruslan Salakhutdinov Department of Computer Science University of Toronto Toronto, ON, Canada {yburda,rgrosse,rsalakhu}@cs.toronto.edu
Pseudocode No The paper describes the training procedure and algorithms using mathematical equations and prose, but does not include a dedicated pseudocode block or algorithm listing.
Open Source Code No The paper does not provide any statements or links indicating that the source code for the described methodology is publicly available.
Open Datasets Yes We evaluated the models on two benchmark datasets: MNIST, a dataset of images of handwritten digits (Le Cun et al., 1998), and Omniglot, a dataset of handwritten characters in a variety of world alphabets (Lake et al., 2013).
Dataset Splits No We used the standard splits of MNIST into 60,000 training and 10,000 test examples, and of Omniglot into 24,345 training and 8,070 test examples. The paper mentions train and test sets but does not specify a separate validation set for hyperparameter tuning during training.
Hardware Specification No In our GPU-based implementation, the samples are processed in parallel by replicating each training example k times within a mini-batch. The paper mentions using GPUs but does not specify any particular model or other hardware specifications like CPU type or memory.
Software Dependencies No For optimization, we used Adam (Kingma & Ba, 2015) with parameters β1 = 0.9, β2 = 0.999, ϵ = 10 4 and minibaches of size 20. The paper mentions using Adam optimizer but does not specify the version of any software library (e.g., TensorFlow, PyTorch) or programming language versions used for implementation.
Experiment Setup Yes All models were initialized with the heuristic of Glorot & Bengio (2010). For optimization, we used Adam (Kingma & Ba, 2015) with parameters β1 = 0.9, β2 = 0.999, ϵ = 10 4 and minibaches of size 20. The training proceeded for 3i passes over the data with learning rate of 0.001 10 i/7 for i = 0 . . . 7 (for a total of P7 i=0 3i = 3280 passes over the data).