Reweighted Wake-Sleep
Authors: Jorg Bornschein and Yoshua Bengio
ICLR 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This interpretation is confirmed experimentally, showing that better likelihood can be achieved with this reweighted wake-sleep procedure. Our experiments show that using a more powerful layer model, such as NADE, yields substantially better generative models. |
| Researcher Affiliation | Academia | J org Bornschein and Yoshua Bengio Department of Computer Science and Operations Research University of Montreal Montreal, Quebec, Canada |
| Pseudocode | Yes | Algorithm 1 Reweighted Wake-Sleep training procedure and likelihood estimator. |
| Open Source Code | Yes | Our implementation is available at https://github.com/jbornschein/reweighted-ws/. |
| Open Datasets | Yes | We use the MNIST dataset that was binarized according to Murray and Salakhutdinov (2009) and downloaded in binarized form from (Larochelle, 2011). |
| Dataset Splits | Yes | From these three we always report the experiment with the highest validation log-likelihood. |
| Hardware Specification | No | The paper mentions 'Compute Canada, and Calcul Qu ebec for providing computational resources' but does not specify particular hardware models (e.g., GPU or CPU types, memory) used for the experiments. |
| Software Dependencies | No | The paper acknowledges 'the developers of Theano (Bergstra et al., 2010; Bastien et al., 2012) for their powerful software', but it does not specify a version number for Theano or any other software dependencies. |
| Experiment Setup | Yes | For training we use stochastic gradient decent with momentum (β=0.95) and set mini-batch size to 25. The experiments in this paragraph were run with learning rates of {0.0003, 0.001, and 0.003}. From these three we always report the experiment with the highest validation log-likelihood. In the majority of our experiments a learning rate of 0.001 gave the best results, even across different layer models (SBN, AR-SBN and NADE). If not noted otherwise, we use K = 5 samples during training and K = 100, 000 samples to estimate the final log-likelihood on the test set. |