Reweighted Wake-Sleep

Authors: Jorg Bornschein and Yoshua Bengio

ICLR 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This interpretation is confirmed experimentally, showing that better likelihood can be achieved with this reweighted wake-sleep procedure. Our experiments show that using a more powerful layer model, such as NADE, yields substantially better generative models.
Researcher Affiliation Academia J org Bornschein and Yoshua Bengio Department of Computer Science and Operations Research University of Montreal Montreal, Quebec, Canada
Pseudocode Yes Algorithm 1 Reweighted Wake-Sleep training procedure and likelihood estimator.
Open Source Code Yes Our implementation is available at https://github.com/jbornschein/reweighted-ws/.
Open Datasets Yes We use the MNIST dataset that was binarized according to Murray and Salakhutdinov (2009) and downloaded in binarized form from (Larochelle, 2011).
Dataset Splits Yes From these three we always report the experiment with the highest validation log-likelihood.
Hardware Specification No The paper mentions 'Compute Canada, and Calcul Qu ebec for providing computational resources' but does not specify particular hardware models (e.g., GPU or CPU types, memory) used for the experiments.
Software Dependencies No The paper acknowledges 'the developers of Theano (Bergstra et al., 2010; Bastien et al., 2012) for their powerful software', but it does not specify a version number for Theano or any other software dependencies.
Experiment Setup Yes For training we use stochastic gradient decent with momentum (β=0.95) and set mini-batch size to 25. The experiments in this paragraph were run with learning rates of {0.0003, 0.001, and 0.003}. From these three we always report the experiment with the highest validation log-likelihood. In the majority of our experiments a learning rate of 0.001 gave the best results, even across different layer models (SBN, AR-SBN and NADE). If not noted otherwise, we use K = 5 samples during training and K = 100, 000 samples to estimate the final log-likelihood on the test set.