All in the Exponential Family: Bregman Duality in Thermodynamic Variational Inference
Authors: Rob Brekelmans, Vaden Masrani, Frank Wood, Greg Ver Steeg, Aram Galstyan
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We report test log p (x) values from training a separate VAE at each β1, but this grid search is prohibitively expensive in practice. Our moment-spacing schedule is an adaptive method for choosing β points, which yields near-optimal performance on Omniglot and provides notable improvement over the ELBO. [...] We investigate the effect of our moment-spacing schedule and reparameterization gradients using a continuous latent variable model on the Omniglot dataset. We estimate test log p (x) using the IWAE bound (Burda et al., 2015) with 5k samples, and use S = 50 samples for training unless noted. In all plots, we report averages over five random seeds, with error bars indicating min and max values. |
| Researcher Affiliation | Academia | 1Information Sciences Institute, University of Southern California, Marina del Rey, CA 2University of British Columbia, Vancouver, CA. Correspondence to: Rob Brekelmans <brekelma@usc.edu>, Vaden Masrani <vadmas@cs.ubc.ca>. |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | We describe our model architecture and experiment design in App. F, 1 with runtimes and additional results on binary MNIST in App. H. [1] https://github.com/vmasrani/tvo_all_in |
| Open Datasets | Yes | We investigate the effect of our moment-spacing schedule and reparameterization gradients using a continuous latent variable model on the Omniglot dataset. [...] additional results on binary MNIST in App. H. |
| Dataset Splits | No | The paper mentions using a 'test' split for evaluation and 'samples for training', but does not provide specific percentages, sample counts, or citations to predefined train/validation/test splits. While Omniglot and MNIST have standard splits, these are not explicitly stated within the paper's text. |
| Hardware Specification | No | The paper mentions 'computational resources provided by West Grid (https://www.westgrid.ca/) and Compute Canada (www.computecanada.ca)', but these are general consortia and do not specify exact GPU/CPU models, memory amounts, or other specific hardware details. |
| Software Dependencies | No | The paper does not provide specific software dependency versions (e.g., library or solver names with version numbers like PyTorch 1.9 or Python 3.8). |
| Experiment Setup | Yes | We estimate test log p (x) using the IWAE bound (Burda et al., 2015) with 5k samples, and use S = 50 samples for training unless noted. In all plots, we report averages over five random seeds, with error bars indicating min and max values. [...] In App. G 'Implementation Details': We found that a schedule of K = 50 with 1e-4 initial learning rate, linear decay, and Adam optimizer parameters β1 = 0.9, β2 = 0.999 performed well for our models. |