reproducibilityindex.ai

A Deep and Tractable Density Estimator

Authors: Benigno Uria, Iain Murray, Hugo Larochelle

ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We performed experiments on several binary and real-valued datasets to asses the performance of NADEs trained using our order-agnostic procedure. We report the average test log-likelihood of each model, that is, the average log-density of datapoints in a held-out test set. In the case of NADEs trained in an order-agnostic way, we need to choose an ordering of the variables so that one may calculate the density of the test datapoints. We report the average of the average test log-likelihoods using ten different orderings chosen at random.
Researcher Affiliation	Academia	Benigno Uria B.URIA@ED.AC.UK Iain Murray I.MURRAY@ED.AC.UK School of Informatics, University of Edinburgh Hugo Larochelle HUGO.LAROCHELLE@USHERBROOKE.CA D epartement d informatique, Universit e de Sherbrooke
Pseudocode	Yes	Algorithm 1 Pretraining of a NADE with n hidden layers on dataset X.
Open Source Code	No	The paper does not provide any explicit statement about releasing source code for the methodology or a link to a code repository.
Open Datasets	Yes	We performed experiments on several binary and real-valued datasets to asses the performance of NADEs trained using our order-agnostic procedure. [...] We start by measuring the statistical performance of a NADE trained using our order-agnostic procedure on eight binary UCI datasets (Bache & Lichman, 2013). [...] We also present results on binarized-MNIST (Salakhutdinov & Murray, 2008). [...] We also compared the performance of RNADEs trained with our order-agnostic procedure to RNADEs trained for a ﬁxed ordering. We start by comparing the performance on three low-dimensional UCI datasets (Bache & Lichman, 2013) of heterogeneous data, namely: red wine, white wine and parkinsons. [...] We also measured the performance of our new training procedure on 8 by 8 patches of natural images in the BSDS300 dataset.
Dataset Splits	Yes	To avoid overﬁtting, we earlystopped training by estimating the log-likelihood on a validation dataset after each training iteration using the d JOA estimator, (12). [...] One ninth of the training set examples were used for validation purposes. [...] The dataset s 200 training image set was partitioned into a training set and a validation set of 180 and 20 images respectively.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments. It only discusses computational complexity.
Software Dependencies	No	The paper mentions using specific techniques like "rectiﬁed linear units" and "Nesterov s accelerated gradient" but does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions, or specific library versions).
Experiment Setup	Yes	Training conﬁguration details common to all datasets (except where speciﬁed later on) follow. We trained all orderagnostic NADEs and RNADEs using minibatch stochastic gradient descent on JOA, (11). The initial learning rate, which was chosen independently for each dataset, was reduced linearly to reach zero after the last iteration. For the purpose of consistency, we used rectiﬁed linear units (Nair & Hinton, 2010) in all experiments. [...] We used Nesterov s accelerated gradient (Sutskever, 2013) with momentum value 0.9. [...] We ﬁxed the number of units per hidden layer to 500, following Larochelle & Murray (2011). We used minibatches of size 100. Training was run for 100 iterations, each consisting of 1000 weight updates. The initial learning rate was cross-validated for each of the datasets among values {0.016, 0.004, 0.001, 0.00025, 0.0000675}.