Gaussian Process Bandit Optimization of the Thermodynamic Variational Objective
Authors: Vu Nguyen, Vaden Masrani, Rob Brekelmans, Michael Osborne, Frank Wood
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical validation of our algorithm is provided in terms of improved learning and inference in Variational Autoencoders and Sigmoid Belief Networks. |
| Researcher Affiliation | Academia | Vu Nguyen University of Oxford vu@robots.ox.ac.uk Vaden Masrani University of British Columbia vadmas@cs.ubc.ca Rob Brekelmans USC Information Sciences Institute brekelma@usc.edu Michael A. Osborne University of Oxford mosb@robots.ox.ac.uk Frank Wood University of British Columbia fwood@cs.ubc.ca |
| Pseudocode | Yes | Algorithm 1 GP-bandit for TVO (high level) |
| Open Source Code | Yes | Our code is available at http://github.com/ntienvu/tvo_gp_bandit. |
| Open Datasets | Yes | We demonstrate the effectiveness of our method for training VAEs [17] on MNIST and Fashion MNIST, and a Sigmoid Belief Network [27] on binarized MNIST and binarized Omniglot, using the TVO objective. |
| Dataset Splits | No | The paper mentions evaluating on 'test log evidence' and 'test KL divergence' but does not specify the explicit train/validation/test dataset splits (e.g., percentages or sample counts) used for reproduction. |
| Hardware Specification | No | The paper mentions using computational resources from 'West Grid' and 'Compute Canada' in the acknowledgments, but it does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies, such as library names with version numbers (e.g., 'Python 3.8, PyTorch 1.9'), needed to replicate the experiment. |
| Experiment Setup | Yes | All continuous VAEs use a two-layer encoder and decoder with 200 hidden units per layer, and a 20-dimensional latent space. All experiments use Adam [16] optimizer with a learning rate of 1e-3, with gradient clipping at 100. We evaluate our GPbandit for S {10, 50} and d {2, 5, 10, 15} and, for each configuration, train until convergence using 5 random seeds. We set the update frequency w = 6 initially and increment w by one after every 10 bandit iterations to account for smaller objective changes later in training, and update early if Lt 0.05. We found that selecting βj too close to either 0 or 1 could negatively affect performance, and thus restrict β [0.05, 0.95]d in all experiments. |