MADE: Masked Autoencoder for Distribution Estimation
Authors: Mathieu Germain, Karol Gregor, Iain Murray, Hugo Larochelle
ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments demonstrate that this approach is competitive with stateof-the-art tractable distribution estimators. At test time, the method is significantly faster and scales better than other autoregressive estimators. |
| Researcher Affiliation | Collaboration | Mathieu Germain MATHIEU.GERMAIN2@USHERBROOKE.CA Universit e de Sherbrooke, Canada Karol Gregor KAROL.GREGOR@GMAIL.COM Google Deep Mind Iain Murray I.MURRAY@ED.AC.UK University of Edinburgh, United Kingdom Hugo Larochelle HUGO.LAROCHELLE@USHERBROOKE.CA Universit e de Sherbrooke, Canada |
| Pseudocode | Yes | Algorithm 1 Computation of p(x) and learning gradients for MADE with order and connectivity sampling. |
| Open Source Code | Yes | The code to reproduce the experiments of this paper is available at https://github.com/mgermain/MADE/releases/tag/ICML2015. |
| Open Datasets | Yes | We use the binary UCI evaluation suite that was first put together in Larochelle & Murray (2011). It s a collection of 7 relatively small datasets from the University of California, Irvine machine learning repository and the OCR-letters dataset from the Stanford AI Lab. Table 2 gives an overview of the scale of those datasets and the way they were split. The version of MNIST we used is the one binarized by Salakhutdinov & Murray (2008). |
| Dataset Splits | Yes | Table 2. Number of input dimensions and numbers of examples in the train, validation, and test splits. Name # Inputs Train Valid. Test Adult 123 5000 1414 26147 Connect4 126 16000 4000 47557 DNA 180 1400 600 1186 Mushrooms 112 2000 500 5624 NIPS-0-12 500 400 100 1240 OCR-letters 128 32152 10000 10000 RCV1 150 40000 10000 150000 Web 300 14000 3188 32561 |
| Hardware Specification | Yes | These timings were obtained on a K20 NVIDIA GPU. |
| Software Dependencies | No | The paper mentions 'Theano' and cites related papers (Bastien et al., 2012; Bergstra et al., 2010), but it does not specify a version number for Theano or any other software dependencies used in the experiments. |
| Experiment Setup | Yes | All experiments were made using stochastic gradient descent (SGD) with mini-batches of size 100 and a lookahead of 30 for early stopping. The experiments were run with networks of 500 units per hidden layer, using the adadelta learning update (Zeiler, 2012) with a decay of 0.95. The other hyperparameters were varied as Table 3 indicates. We note as # of masks the number of different masks through which MADE cycles during training. |