Variational Inference with Normalizing Flows

Authors: Danilo Rezende, Shakir Mohamed

ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate that the theoretical advantages of having posteriors that better match the true posterior, combined with the scalability of amortized variational approaches, provides a clear improvement in performance and applicability of variational inference. We show experimentally that the use of general normalizing flows systematically outperforms other competing approaches for posterior approximation.
Researcher Affiliation Industry Danilo Jimenez Rezende DANILOR@GOOGLE.COM Shakir Mohamed SHAKIR@GOOGLE.COM Google Deep Mind, London
Pseudocode Yes Algorithm 1 Variational Inf. with Normalizing Flows
Open Source Code No The paper does not include any explicit statements or links indicating that source code for the methodology is publicly available.
Open Datasets Yes The MNIST digit dataset (Le Cun & Cortes, 1998) contains 60,000 training and 10,000 test images of ten handwritten digits (0 to 9) that are 28x28 pixels in size. We used the binarized dataset as in (Uria et al., 2014). The CIFAR-10 natural images dataset (Krizhevsky & Hinton, 2010) consists of 50,000 training and 10,000 test RGB images that are of size 3x32x32 pixels from which we extract 3x8x8 random patches.
Dataset Splits No The paper mentions 60,000 training and 10,000 test images for MNIST and 50,000 training and 10,000 test images for CIFAR-10, but it does not specify a validation split or explicit percentages for all splits.
Hardware Specification No The paper does not specify any details about the hardware (e.g., GPU models, CPU types, memory) used for the experiments.
Software Dependencies No The paper mentions software components and optimization methods (e.g., Maxout, RMSprop) but does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup Yes The deep neural networks that form the conditional probability between random variables consist of deterministic layers with 400 hidden units using the Maxout non-linearity on windows of 4 variables (Goodfellow et al., 2013). We use mini-batches of 100 data points and RMSprop optimization (with learning rate = 1e-5 and momentum = 0.9) (Kingma & Welling, 2014; Rezende et al., 2014). Results were collected after 500,000 parameter updates. Each experiment was repeated 100 times with different random seeds and we report the averaged scores and standard errors.