Iterative Neural Autoregressive Distribution Estimator NADE-k

Authors: Tapani Raiko, Yao Li, Kyunghyun Cho, Yoshua Bengio

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We study the proposed model with two datasets: binarized MNIST handwritten digits and Caltech 101 silhouettes. We report in Table 1 the mean of the test log-probabilities averaged over randomly selected orderings.
Researcher Affiliation Academia Tapani Raiko Aalto University Li Yao Universit e de Montr eal Kyung Hyun Cho Universit e de Montr eal Yoshua Bengio Universit e de Montr eal, CIFAR Senior Fellow
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes We have made our implementation available git@github.com:yaoli/nade k.git
Open Datasets Yes We study the proposed model with two datasets: binarized MNIST handwritten digits and Caltech 101 silhouettes. ... We closely followed the procedure used by Uria et al. (2014), including the split of the dataset... We also evaluate the proposed NADE-k on Caltech-101 Silhouettes (Marlin et al., 2010), using the standard split...
Dataset Splits Yes We closely followed the procedure used by Uria et al. (2014), including the split of the dataset into 50,000 training samples, 10,000 validation samples and 10,000 test samples. ... using the standard split of 4100 training samples, 2264 validation samples and 2307 test samples.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper mentions using Ada Delta and Theano but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup Yes We use stochastic gradient descent on the training set with a minibatch size fixed to 100. ... We used a fixed width of 500 units per hidden layer. The number of steps k was selected among {1, 2, 4, 5, 7}. ... Each model was pretrained for 1000 epochs and fine-tuned for 1000 epochs in the case of one hidden layer and 2000 epochs in the case of two. ... The regularization constant was chosen to be 0.00122 for the two-hidden-layer model.