reproducibilityindex.ai

Neural Autoregressive Flows

Authors: Chin-Wei Huang, David Krueger, Alexandre Lacoste, Aaron Courville

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimentally, NAF yields state-of-the-art performance on a suite of density estimation tasks and outperforms IAF in variational autoencoders trained on binarized MNIST.
Researcher Affiliation	Collaboration	Chin-Wei Huang 1 2 * David Krueger 1 2 * Alexandre Lacoste 2 Aaron Courville 1 3 Normalizing ﬂows and autoregressive models have been successfully combined to produce state-of-the-art results in density estimation, via Masked Autoregressive Flows (MAF) (Papamakarios et al., 2017), and to accelerate stateof-the-art Wave Net-based speech synthesis to 20x faster than real-time (Oord et al., 2017), via Inverse Autoregressive Flows (IAF) (Kingma et al., 2016). We unify and generalize these approaches, replacing the (conditionally) afﬁne univariate transformations of MAF/IAF with a more general class of invertible univariate transformations expressed as monotonic neural networks. We demonstrate that the proposed neural autoregressive ﬂows (NAF) are universal approximators for continuous probability distributions, and their greater expressivity allows them to better capture multimodal target distributions. Experimentally, NAF yields state-of-the-art performance on a suite of density estimation tasks and outperforms IAF in variational autoencoders trained on binarized MNIST. 1 1. Introduction Invertible transformations with a tractable Jacobian, also known as normalizing ﬂows, are useful tools in many machine learning problems, for example: (1) In the context of deep generative models, training necessitates evaluating data samples under the model s inverse transformation (Dinh et al., 2017). Tractable density is an appealing property for these models, since it allows the objective of interest to be directly optimized; whereas other mainstream methods rely on alternative losses, in the case of intractable density models (Kingma & Welling, 2014; Rezende et al., 2014), or *Equal contribution 1MILA, University of Montreal 2Element AI 3CIFAR fellow. Correspondence to: Chin-Wei Huang <chinwei.huang@umontreal.ca>.
Pseudocode	No	The paper describes the proposed method using mathematical equations and diagrams, but does not include any explicit pseudocode or algorithm blocks.
Open Source Code	Yes	1Implementation can be found at https://github.com/CWHuang/NAF/
Open Datasets	Yes	For larger-scale experiments, we show that using NAF instead of IAF to approximate the posterior distribution of latent variables in a variational autoencoder (Kingma & Welling, 2014; Rezende et al., 2014) yields better likelihood results on binarized MNIST (Larochelle & Murray, 2011) (Section 6.3). Finally, we report our experimental results on density estimation of a suite of UCI datasets (Section 6.4).
Dataset Splits	No	The paper reports "validation results" in Table 2, but does not provide explicit details about the specific training/validation/test splits (e.g., percentages, sample counts, or references to predefined splits) used for these datasets.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running experiments (e.g., GPU models, CPU types, or cloud instance specifications).
Software Dependencies	No	The paper mentions deep learning frameworks but does not explicitly list specific software dependencies with their version numbers required to reproduce the experiments.
Experiment Setup	Yes	We ﬁnd that small neural network transformers of 1 or 2 hidden layers with 8 or 16 sigmoid units perform well across our experiments, although there are other possibilities worth exploring (see Section 3.3). Sigmoids contain inﬂection points, and so can easily induce inﬂection points in τc, and thus multimodality in p(yt).