Importance Weighted Autoencoders
Authors: Yuri Burda, Ruslan Salakhutdinov, Roger Grosse
ICLR 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show empirically that IWAEs learn richer latent space representations than VAEs, leading to improved test log-likelihood on density estimation benchmarks. 5 EXPERIMENTAL RESULTS We have compared the generative performance of the VAE and IWAE in terms of their held-out log-likelihoods on two density estimation benchmark datasets. |
| Researcher Affiliation | Academia | Yuri Burda, Roger Grosse & Ruslan Salakhutdinov Department of Computer Science University of Toronto Toronto, ON, Canada {yburda,rgrosse,rsalakhu}@cs.toronto.edu |
| Pseudocode | No | The paper describes the training procedure and algorithms using mathematical equations and prose, but does not include a dedicated pseudocode block or algorithm listing. |
| Open Source Code | No | The paper does not provide any statements or links indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | We evaluated the models on two benchmark datasets: MNIST, a dataset of images of handwritten digits (Le Cun et al., 1998), and Omniglot, a dataset of handwritten characters in a variety of world alphabets (Lake et al., 2013). |
| Dataset Splits | No | We used the standard splits of MNIST into 60,000 training and 10,000 test examples, and of Omniglot into 24,345 training and 8,070 test examples. The paper mentions train and test sets but does not specify a separate validation set for hyperparameter tuning during training. |
| Hardware Specification | No | In our GPU-based implementation, the samples are processed in parallel by replicating each training example k times within a mini-batch. The paper mentions using GPUs but does not specify any particular model or other hardware specifications like CPU type or memory. |
| Software Dependencies | No | For optimization, we used Adam (Kingma & Ba, 2015) with parameters β1 = 0.9, β2 = 0.999, ϵ = 10 4 and minibaches of size 20. The paper mentions using Adam optimizer but does not specify the version of any software library (e.g., TensorFlow, PyTorch) or programming language versions used for implementation. |
| Experiment Setup | Yes | All models were initialized with the heuristic of Glorot & Bengio (2010). For optimization, we used Adam (Kingma & Ba, 2015) with parameters β1 = 0.9, β2 = 0.999, ϵ = 10 4 and minibaches of size 20. The training proceeded for 3i passes over the data with learning rate of 0.001 10 i/7 for i = 0 . . . 7 (for a total of P7 i=0 3i = 3280 passes over the data). |