All in the Exponential Family: Bregman Duality in Thermodynamic Variational Inference

Authors: Rob Brekelmans, Vaden Masrani, Frank Wood, Greg Ver Steeg, Aram Galstyan

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We report test log p (x) values from training a separate VAE at each β1, but this grid search is prohibitively expensive in practice. Our moment-spacing schedule is an adaptive method for choosing β points, which yields near-optimal performance on Omniglot and provides notable improvement over the ELBO. [...] We investigate the effect of our moment-spacing schedule and reparameterization gradients using a continuous latent variable model on the Omniglot dataset. We estimate test log p (x) using the IWAE bound (Burda et al., 2015) with 5k samples, and use S = 50 samples for training unless noted. In all plots, we report averages over five random seeds, with error bars indicating min and max values.
Researcher Affiliation Academia 1Information Sciences Institute, University of Southern California, Marina del Rey, CA 2University of British Columbia, Vancouver, CA. Correspondence to: Rob Brekelmans <brekelma@usc.edu>, Vaden Masrani <vadmas@cs.ubc.ca>.
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes We describe our model architecture and experiment design in App. F, 1 with runtimes and additional results on binary MNIST in App. H. [1] https://github.com/vmasrani/tvo_all_in
Open Datasets Yes We investigate the effect of our moment-spacing schedule and reparameterization gradients using a continuous latent variable model on the Omniglot dataset. [...] additional results on binary MNIST in App. H.
Dataset Splits No The paper mentions using a 'test' split for evaluation and 'samples for training', but does not provide specific percentages, sample counts, or citations to predefined train/validation/test splits. While Omniglot and MNIST have standard splits, these are not explicitly stated within the paper's text.
Hardware Specification No The paper mentions 'computational resources provided by West Grid (https://www.westgrid.ca/) and Compute Canada (www.computecanada.ca)', but these are general consortia and do not specify exact GPU/CPU models, memory amounts, or other specific hardware details.
Software Dependencies No The paper does not provide specific software dependency versions (e.g., library or solver names with version numbers like PyTorch 1.9 or Python 3.8).
Experiment Setup Yes We estimate test log p (x) using the IWAE bound (Burda et al., 2015) with 5k samples, and use S = 50 samples for training unless noted. In all plots, we report averages over five random seeds, with error bars indicating min and max values. [...] In App. G 'Implementation Details': We found that a schedule of K = 50 with 1e-4 initial learning rate, linear decay, and Adam optimizer parameters β1 = 0.9, β2 = 0.999 performed well for our models.