Autoregressive Energy Machines

Authors: Charlie Nash, Conor Durkan

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The Autoregressive Energy Machine achieves state-of-the-art performance on a suite of density-estimation tasks. and 4. Experiments For our experiments, we use a Res MADE with four residual blocks for the ARNN, as well as a fully-connected residual architecture for the ENN, also with four residual blocks.
Researcher Affiliation Academia 1School of Informatics, University of Edinburgh, United Kingdom. Correspondence to: Charlie Nash <charlie.nash@ed.ac.uk>, Conor Durkan <conor.durkan@ed.ac.uk>.
Pseudocode No No structured pseudocode or algorithm blocks were found in the paper.
Open Source Code Yes code is available at https://github.com/conormdurkan/ autoregressive-energy-machines.
Open Datasets Yes UCI machine learning repository (Dheeru & Karra Taniskidou, 2017), and BSDS300 datasets of natural images (Martin et al., 2001).
Dataset Splits Yes Then, we compute the integral of the log unnormalized density corresponding to that onedimensional conditional, using a log-trapezoidal rule and context vectors generated from a held out-validation set of 1000 samples. and The KDE bandwidth and proposal distribution mixture weighting are optimized on the validation set.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory) used for running experiments are provided in the main text of the paper.
Software Dependencies No The paper mentions using "Adam optimizer" but does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, or other library versions).
Experiment Setup Yes For our experiments, we use a Res MADE with four residual blocks for the ARNN, as well as a fully-connected residual architecture for the ENN, also with four residual blocks. The number of hidden units in the Res MADE is varied per task. We use the Adam optimizer (Kingma & Ba, 2014), and anneal the learning rate to zero over the course of training using a cosine schedule (Loshchilov & Hutter, 2016). For some tasks, we find regularization by dropout (Srivastava et al., 2014) to be beneficial.