Transformation Autoregressive Networks

Authors: Junier Oliva, Avinava Dubey, Manzil Zaheer, Barnabas Poczos, Ruslan Salakhutdinov, Eric Xing, Jeff Schneider

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4. Experiments We now present empirical studies for our TAN framework in order to establish (i) the superiority of TANs over one-prong approaches (Sec. 4.1), (ii) that TANs are accurate on real world datasets (Sec. 4.2), (iii) the importance of various components of TANs, (iv) that TANs are easily amenable to various tasks (Sec. 4.4), such as learning a parametric family of distributions and being able to generalize over unseen parameter values (Sec. 4.5).
Researcher Affiliation Academia 1Computer Science Department, University of North Carolina, Chapel Hill, NC 27599 (Work completed while at CMU.) 2Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA 15123.
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes 1See https://github.com/lupalab/tan.
Open Datasets Yes We carefully followed (Papamakarios et al., 2017) and code (MAF Git Repository) to ensure that we operated over the same instances and covariates for each of the datasets considered in (Papamakarios et al., 2017). Specifically we performed unconditional density estimation on four datasets from UCI machine learning repository2: power: d=6; N=2,049,280 GAS d=8; N=1,052,065 HEPMASS d=21; N=525,123 MINIBOONE d=43; N=36,488 BSDS300 d=63; N=1,300,000... 2http://archive.ics.uci.edu/ml/
Dataset Splits Yes After training, the best iteration according to the validation set loss was used to produce the test set results.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running experiments.
Software Dependencies No Models were implemented in Tensorflow (Abadi et al., 2016). While TensorFlow is mentioned, no specific version number for it or other software libraries is provided.
Experiment Setup Yes We take the mixture models of conditionals (2) to be mixtures of 40 Gaussians. We optimize all models using the Adam Optimizer (Kingma & Ba, 2014) with an initial learning rate of 0.005. Training consisted of 30000 iterations, with mini-batches of size 256. The learning rate was decreased by a factor of 0.1, or 0.5 (chosen via a validation set) every 5000 iterations. Gradient clipping with a norm of 1 was used.