Inference Suboptimality in Variational Autoencoders

Authors: Chris Cremer, Xuechen Li, David Duvenaud

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments investigate how the choice of encoder, posterior approximation, decoder, and optimization affect the approximation and amortization gaps. We train VAE models in a number of settings on the MNIST (Le Cun et al., 1998), Fashion-MNIST (Xiao, 2017), and CIFAR-10 (Krizhevsky & Hinton, 2009) datasets.
Researcher Affiliation Academia 1Department of Computer Science, University of Toronto, Toronto, Canada.
Pseudocode No The paper describes mathematical formulations and procedures but does not include any pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any links or explicit statements about the availability of open-source code for the described methodology.
Open Datasets Yes We train VAE models in a number of settings on the MNIST (Le Cun et al., 1998), Fashion-MNIST (Xiao, 2017), and CIFAR-10 (Krizhevsky & Hinton, 2009) datasets.
Dataset Splits No The paper mentions evaluating on subsets of training and validation datapoints but does not specify exact split percentages or sample counts required for reproduction of data partitioning.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies No The paper mentions optimizers and initialization techniques by name but does not provide specific version numbers for software dependencies or libraries.
Experiment Setup Yes For the local optimization of q F F G, we initialize the mean and variance as the prior, i.e. N(0, I). We optimize the mean and variance using the Adam optimizer with a learning rate of 10^-3. To determine convergence, after every 100 optimization steps, we compute the average of the previous 100 ELBO values and compare it to the best achieved average. If it does not improve for 10 consecutive iterations then the optimization is terminated. For q F low and q AF , the same process is used to optimize all of its parameters. All neural nets for the flow were initialized with a variant of the Xavier initilization (Glorot & Bengio, 2010). We use 100 Monte Carlo samples to compute the ELBO to reduce variance.