Gaussian Process Bandit Optimization of the Thermodynamic Variational Objective

Authors: Vu Nguyen, Vaden Masrani, Rob Brekelmans, Michael Osborne, Frank Wood

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical validation of our algorithm is provided in terms of improved learning and inference in Variational Autoencoders and Sigmoid Belief Networks.
Researcher Affiliation Academia Vu Nguyen University of Oxford vu@robots.ox.ac.uk Vaden Masrani University of British Columbia vadmas@cs.ubc.ca Rob Brekelmans USC Information Sciences Institute brekelma@usc.edu Michael A. Osborne University of Oxford mosb@robots.ox.ac.uk Frank Wood University of British Columbia fwood@cs.ubc.ca
Pseudocode Yes Algorithm 1 GP-bandit for TVO (high level)
Open Source Code Yes Our code is available at http://github.com/ntienvu/tvo_gp_bandit.
Open Datasets Yes We demonstrate the effectiveness of our method for training VAEs [17] on MNIST and Fashion MNIST, and a Sigmoid Belief Network [27] on binarized MNIST and binarized Omniglot, using the TVO objective.
Dataset Splits No The paper mentions evaluating on 'test log evidence' and 'test KL divergence' but does not specify the explicit train/validation/test dataset splits (e.g., percentages or sample counts) used for reproduction.
Hardware Specification No The paper mentions using computational resources from 'West Grid' and 'Compute Canada' in the acknowledgments, but it does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for the experiments.
Software Dependencies No The paper does not provide specific software dependencies, such as library names with version numbers (e.g., 'Python 3.8, PyTorch 1.9'), needed to replicate the experiment.
Experiment Setup Yes All continuous VAEs use a two-layer encoder and decoder with 200 hidden units per layer, and a 20-dimensional latent space. All experiments use Adam [16] optimizer with a learning rate of 1e-3, with gradient clipping at 100. We evaluate our GPbandit for S {10, 50} and d {2, 5, 10, 15} and, for each configuration, train until convergence using 5 random seeds. We set the update frequency w = 6 initially and increment w by one after every 10 bandit iterations to account for smaller objective changes later in training, and update early if Lt 0.05. We found that selecting βj too close to either 0 or 1 could negatively affect performance, and thus restrict β [0.05, 0.95]d in all experiments.