Fixing a Broken ELBO
Authors: Alexander Alemi, Ben Poole, Ian Fischer, Joshua Dillon, Rif A. Saurous, Kevin Murphy
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In addition to our unifying theoretical framework, we empirically study the performance of a variety of different VAE models with both simple and complex encoders, decoders, and priors on several simple image datasets in terms of the RD curve. We show that VAEs with powerful autoregressive decoders can be trained to not ignore their latent code by targeting certain points on this curve. We also show how it is possible to recover the true generative process (up to reparameterization) of a simple model on a synthetic dataset with no prior knowledge except for the true value of the mutual information I (derived from the true generative model). We believe that information constraints provide an interesting alternative way to regularize the learning of latent variable models. |
| Researcher Affiliation | Collaboration | 1Google AI 2Stanford University. Correspondence to: Alexander A. Alemi <alemi@google.com>. |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statement or link indicating the availability of open-source code for the described methodology. |
| Open Datasets | Yes | We use the static binary MNIST dataset from Larochelle & Murray (2011). |
| Dataset Splits | No | The paper refers to using the 'static binary MNIST dataset' from a citation but does not explicitly state the train/validation/test split percentages, sample counts, or specific methodology for data partitioning. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running the experiments. |
| Software Dependencies | No | The paper does not mention any specific software dependencies or libraries with version numbers. |
| Experiment Setup | Yes | We train them all to minimize the β-VAE objective in Equation 6. Full details can be found in Appendix F. Runs were performed at various values of β ranging from 0.1 to 10.0, both with and without KL annealing (Bowman et al., 2016). |