Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

MADE: Masked Autoencoder for Distribution Estimation

Authors: Mathieu Germain, Karol Gregor, Iain Murray, Hugo Larochelle

ICML 2015 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments demonstrate that this approach is competitive with stateof-the-art tractable distribution estimators. At test time, the method is significantly faster and scales better than other autoregressive estimators.
Researcher Affiliation Collaboration Mathieu Germain EMAIL Universit e de Sherbrooke, Canada Karol Gregor EMAIL Google Deep Mind Iain Murray EMAIL University of Edinburgh, United Kingdom Hugo Larochelle EMAIL Universit e de Sherbrooke, Canada
Pseudocode Yes Algorithm 1 Computation of p(x) and learning gradients for MADE with order and connectivity sampling.
Open Source Code Yes The code to reproduce the experiments of this paper is available at https://github.com/mgermain/MADE/releases/tag/ICML2015.
Open Datasets Yes We use the binary UCI evaluation suite that was first put together in Larochelle & Murray (2011). It s a collection of 7 relatively small datasets from the University of California, Irvine machine learning repository and the OCR-letters dataset from the Stanford AI Lab. Table 2 gives an overview of the scale of those datasets and the way they were split. The version of MNIST we used is the one binarized by Salakhutdinov & Murray (2008).
Dataset Splits Yes Table 2. Number of input dimensions and numbers of examples in the train, validation, and test splits. Name # Inputs Train Valid. Test Adult 123 5000 1414 26147 Connect4 126 16000 4000 47557 DNA 180 1400 600 1186 Mushrooms 112 2000 500 5624 NIPS-0-12 500 400 100 1240 OCR-letters 128 32152 10000 10000 RCV1 150 40000 10000 150000 Web 300 14000 3188 32561
Hardware Specification Yes These timings were obtained on a K20 NVIDIA GPU.
Software Dependencies No The paper mentions 'Theano' and cites related papers (Bastien et al., 2012; Bergstra et al., 2010), but it does not specify a version number for Theano or any other software dependencies used in the experiments.
Experiment Setup Yes All experiments were made using stochastic gradient descent (SGD) with mini-batches of size 100 and a lookahead of 30 for early stopping. The experiments were run with networks of 500 units per hidden layer, using the adadelta learning update (Zeiler, 2012) with a decay of 0.95. The other hyperparameters were varied as Table 3 indicates. We note as # of masks the number of different masks through which MADE cycles during training.