Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Amortised Learning by Wake-Sleep
Authors: Li Wenliang, Theodore Moskovitz, Heishiro Kanagawa, Maneesh Sahani
ICML 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate ALWS on a wide range of generative models. Details for each experiment can be found in Appendix C. The results are shown in Figure 8. According to FID, ALWS-A is the best ML method for binarised MNIST, Fashion, and CIFAR-10. |
| Researcher Affiliation | Academia | 1Gatsby Computational Neuroscience Unit. Correspondence to: Li K. Wenliang <EMAIL>. |
| Pseudocode | Yes | Algorithm 1: Amortised learning by wake sleep |
| Open Source Code | Yes | Code is at github.com/kevin-w-li/al-ws |
| Open Datasets | Yes | We chose six benchmark datasets: the binarised and original MNIST (Le Cun et al., 1998) (B-MNIST and MNIST, respectively), fashion MNIST (Fashion) (Xiao et al., 2017), natural images (Natural) (Hateren & Schaaf, 1998), CIFAR10 (Krizhevsky et al., 2009) and Celeb A (Liu et al., 2015). |
| Dataset Splits | No | The paper uses standard benchmark datasets like MNIST, CIFAR-10, etc., but does not explicitly state the train/validation/test splits (e.g., percentages or sample counts) used for its experiments, nor does it cite a specific methodology for splitting. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory specifications) used for running experiments were mentioned. |
| Software Dependencies | No | The paper mentions various algorithms and frameworks (e.g., DCGAN) but does not provide specific software version numbers for key dependencies (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | For ALWS, we used a Gaussian kernel with a bandwidth equal to the median distance between samples generated for each b, and set λ = 0.01. Each algorithm is run for 50 epochs 10 times with different initialisations, except for SIVI where we trained for 1000 epochs with a lower learning rate for stability. |