Inference Suboptimality in Variational Autoencoders
Authors: Chris Cremer, Xuechen Li, David Duvenaud
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments investigate how the choice of encoder, posterior approximation, decoder, and optimization affect the approximation and amortization gaps. We train VAE models in a number of settings on the MNIST (Le Cun et al., 1998), Fashion-MNIST (Xiao, 2017), and CIFAR-10 (Krizhevsky & Hinton, 2009) datasets. |
| Researcher Affiliation | Academia | 1Department of Computer Science, University of Toronto, Toronto, Canada. |
| Pseudocode | No | The paper describes mathematical formulations and procedures but does not include any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any links or explicit statements about the availability of open-source code for the described methodology. |
| Open Datasets | Yes | We train VAE models in a number of settings on the MNIST (Le Cun et al., 1998), Fashion-MNIST (Xiao, 2017), and CIFAR-10 (Krizhevsky & Hinton, 2009) datasets. |
| Dataset Splits | No | The paper mentions evaluating on subsets of training and validation datapoints but does not specify exact split percentages or sample counts required for reproduction of data partitioning. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions optimizers and initialization techniques by name but does not provide specific version numbers for software dependencies or libraries. |
| Experiment Setup | Yes | For the local optimization of q F F G, we initialize the mean and variance as the prior, i.e. N(0, I). We optimize the mean and variance using the Adam optimizer with a learning rate of 10^-3. To determine convergence, after every 100 optimization steps, we compute the average of the previous 100 ELBO values and compare it to the best achieved average. If it does not improve for 10 consecutive iterations then the optimization is terminated. For q F low and q AF , the same process is used to optimize all of its parameters. All neural nets for the flow were initialized with a variant of the Xavier initilization (Glorot & Bengio, 2010). We use 100 Monte Carlo samples to compute the ELBO to reduce variance. |