Neural Autoregressive Flows
Authors: Chin-Wei Huang, David Krueger, Alexandre Lacoste, Aaron Courville
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimentally, NAF yields state-of-the-art performance on a suite of density estimation tasks and outperforms IAF in variational autoencoders trained on binarized MNIST. |
| Researcher Affiliation | Collaboration | Chin-Wei Huang 1 2 * David Krueger 1 2 * Alexandre Lacoste 2 Aaron Courville 1 3 Normalizing flows and autoregressive models have been successfully combined to produce state-of-the-art results in density estimation, via Masked Autoregressive Flows (MAF) (Papamakarios et al., 2017), and to accelerate stateof-the-art Wave Net-based speech synthesis to 20x faster than real-time (Oord et al., 2017), via Inverse Autoregressive Flows (IAF) (Kingma et al., 2016). We unify and generalize these approaches, replacing the (conditionally) affine univariate transformations of MAF/IAF with a more general class of invertible univariate transformations expressed as monotonic neural networks. We demonstrate that the proposed neural autoregressive flows (NAF) are universal approximators for continuous probability distributions, and their greater expressivity allows them to better capture multimodal target distributions. Experimentally, NAF yields state-of-the-art performance on a suite of density estimation tasks and outperforms IAF in variational autoencoders trained on binarized MNIST. 1 1. Introduction Invertible transformations with a tractable Jacobian, also known as normalizing flows, are useful tools in many machine learning problems, for example: (1) In the context of deep generative models, training necessitates evaluating data samples under the model s inverse transformation (Dinh et al., 2017). Tractable density is an appealing property for these models, since it allows the objective of interest to be directly optimized; whereas other mainstream methods rely on alternative losses, in the case of intractable density models (Kingma & Welling, 2014; Rezende et al., 2014), or *Equal contribution 1MILA, University of Montreal 2Element AI 3CIFAR fellow. Correspondence to: Chin-Wei Huang <chinwei.huang@umontreal.ca>. |
| Pseudocode | No | The paper describes the proposed method using mathematical equations and diagrams, but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | 1Implementation can be found at https://github.com/CWHuang/NAF/ |
| Open Datasets | Yes | For larger-scale experiments, we show that using NAF instead of IAF to approximate the posterior distribution of latent variables in a variational autoencoder (Kingma & Welling, 2014; Rezende et al., 2014) yields better likelihood results on binarized MNIST (Larochelle & Murray, 2011) (Section 6.3). Finally, we report our experimental results on density estimation of a suite of UCI datasets (Section 6.4). |
| Dataset Splits | No | The paper reports "validation results" in Table 2, but does not provide explicit details about the specific training/validation/test splits (e.g., percentages, sample counts, or references to predefined splits) used for these datasets. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running experiments (e.g., GPU models, CPU types, or cloud instance specifications). |
| Software Dependencies | No | The paper mentions deep learning frameworks but does not explicitly list specific software dependencies with their version numbers required to reproduce the experiments. |
| Experiment Setup | Yes | We find that small neural network transformers of 1 or 2 hidden layers with 8 or 16 sigmoid units perform well across our experiments, although there are other possibilities worth exploring (see Section 3.3). Sigmoids contain inflection points, and so can easily induce inflection points in τc, and thus multimodality in p(yt). |