A Deep and Tractable Density Estimator
Authors: Benigno Uria, Iain Murray, Hugo Larochelle
ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We performed experiments on several binary and real-valued datasets to asses the performance of NADEs trained using our order-agnostic procedure. We report the average test log-likelihood of each model, that is, the average log-density of datapoints in a held-out test set. In the case of NADEs trained in an order-agnostic way, we need to choose an ordering of the variables so that one may calculate the density of the test datapoints. We report the average of the average test log-likelihoods using ten different orderings chosen at random. |
| Researcher Affiliation | Academia | Benigno Uria B.URIA@ED.AC.UK Iain Murray I.MURRAY@ED.AC.UK School of Informatics, University of Edinburgh Hugo Larochelle HUGO.LAROCHELLE@USHERBROOKE.CA D epartement d informatique, Universit e de Sherbrooke |
| Pseudocode | Yes | Algorithm 1 Pretraining of a NADE with n hidden layers on dataset X. |
| Open Source Code | No | The paper does not provide any explicit statement about releasing source code for the methodology or a link to a code repository. |
| Open Datasets | Yes | We performed experiments on several binary and real-valued datasets to asses the performance of NADEs trained using our order-agnostic procedure. [...] We start by measuring the statistical performance of a NADE trained using our order-agnostic procedure on eight binary UCI datasets (Bache & Lichman, 2013). [...] We also present results on binarized-MNIST (Salakhutdinov & Murray, 2008). [...] We also compared the performance of RNADEs trained with our order-agnostic procedure to RNADEs trained for a fixed ordering. We start by comparing the performance on three low-dimensional UCI datasets (Bache & Lichman, 2013) of heterogeneous data, namely: red wine, white wine and parkinsons. [...] We also measured the performance of our new training procedure on 8 by 8 patches of natural images in the BSDS300 dataset. |
| Dataset Splits | Yes | To avoid overfitting, we earlystopped training by estimating the log-likelihood on a validation dataset after each training iteration using the d JOA estimator, (12). [...] One ninth of the training set examples were used for validation purposes. [...] The dataset s 200 training image set was partitioned into a training set and a validation set of 180 and 20 images respectively. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments. It only discusses computational complexity. |
| Software Dependencies | No | The paper mentions using specific techniques like "rectified linear units" and "Nesterov s accelerated gradient" but does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions, or specific library versions). |
| Experiment Setup | Yes | Training configuration details common to all datasets (except where specified later on) follow. We trained all orderagnostic NADEs and RNADEs using minibatch stochastic gradient descent on JOA, (11). The initial learning rate, which was chosen independently for each dataset, was reduced linearly to reach zero after the last iteration. For the purpose of consistency, we used rectified linear units (Nair & Hinton, 2010) in all experiments. [...] We used Nesterov s accelerated gradient (Sutskever, 2013) with momentum value 0.9. [...] We fixed the number of units per hidden layer to 500, following Larochelle & Murray (2011). We used minibatches of size 100. Training was run for 100 iterations, each consisting of 1000 weight updates. The initial learning rate was cross-validated for each of the datasets among values {0.016, 0.004, 0.001, 0.00025, 0.0000675}. |