reproducibilityindex.ai

Gaussian Process Bandit Optimization of the Thermodynamic Variational Objective

Authors: Vu Nguyen, Vaden Masrani, Rob Brekelmans, Michael Osborne, Frank Wood

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical validation of our algorithm is provided in terms of improved learning and inference in Variational Autoencoders and Sigmoid Belief Networks.
Researcher Affiliation	Academia	Vu Nguyen University of Oxford vu@robots.ox.ac.uk Vaden Masrani University of British Columbia vadmas@cs.ubc.ca Rob Brekelmans USC Information Sciences Institute brekelma@usc.edu Michael A. Osborne University of Oxford mosb@robots.ox.ac.uk Frank Wood University of British Columbia fwood@cs.ubc.ca
Pseudocode	Yes	Algorithm 1 GP-bandit for TVO (high level)
Open Source Code	Yes	Our code is available at http://github.com/ntienvu/tvo_gp_bandit.
Open Datasets	Yes	We demonstrate the effectiveness of our method for training VAEs [17] on MNIST and Fashion MNIST, and a Sigmoid Belief Network [27] on binarized MNIST and binarized Omniglot, using the TVO objective.
Dataset Splits	No	The paper mentions evaluating on 'test log evidence' and 'test KL divergence' but does not specify the explicit train/validation/test dataset splits (e.g., percentages or sample counts) used for reproduction.
Hardware Specification	No	The paper mentions using computational resources from 'West Grid' and 'Compute Canada' in the acknowledgments, but it does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for the experiments.
Software Dependencies	No	The paper does not provide specific software dependencies, such as library names with version numbers (e.g., 'Python 3.8, PyTorch 1.9'), needed to replicate the experiment.
Experiment Setup	Yes	All continuous VAEs use a two-layer encoder and decoder with 200 hidden units per layer, and a 20-dimensional latent space. All experiments use Adam [16] optimizer with a learning rate of 1e-3, with gradient clipping at 100. We evaluate our GPbandit for S {10, 50} and d {2, 5, 10, 15} and, for each conﬁguration, train until convergence using 5 random seeds. We set the update frequency w = 6 initially and increment w by one after every 10 bandit iterations to account for smaller objective changes later in training, and update early if Lt 0.05. We found that selecting βj too close to either 0 or 1 could negatively affect performance, and thus restrict β [0.05, 0.95]d in all experiments.