Scalable Thompson Sampling using Sparse Gaussian Process Models

Authors: Sattar Vakili, Henry Moss, Artem Artemev, Vincent Dutordoir, Victor Picheny

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our algorithm achieves near state-of-the-art regret performance on standard benchmarks through extensive numerical experiments.
Researcher Affiliation Academia University of Cambridge, Imperial College London, Vector Institute, University of Toronto, The Alan Turing Institute
Pseudocode Yes Algorithm 1: Scalable Sparse GP-TS for Bayesian Optimization
Open Source Code Yes Our code is available at https://github.com/lucas-schmied/sgpts
Open Datasets Yes We benchmark SGPT-TS on common synthetic Bayesian optimization functions as well as a range of practical black-box optimization tasks from DeepGP [27] and BOCS [28]. For real-world datasets, we use the datasets from DeepGP [27], which consist of 9 real-world tasks. The datasets are downloaded from the authors’ GitHub repository.
Dataset Splits No The paper focuses on Bayesian Optimization, which typically does not involve explicit train/validation/test splits of a static dataset for model training, but rather evaluates performance over BO iterations. No specific dataset splits (e.g., percentages or counts) are mentioned for the benchmark functions or real-world tasks.
Hardware Specification Yes All experiments were run on an NVIDIA GeForce RTX 2080 Ti GPU.
Software Dependencies No Our code is implemented in PyTorch [39] and uses the GPyTorch [12] library for Gaussian processes. For the sparse GP components, we use the methods from Spearmint [42] (which relies on GPFlow [29]) and GPy [4].
Experiment Setup Yes All tasks are evaluated over 50 BO iterations across 10 random seeds. The maximum number of training points for our sparse GPs is set to 200, which leads to a batch size of 20 when using 10 random seeds per iteration. We follow [48] for initialization of the inducing points. We consider three acquisition functions: Expected Improvement (EI) [34], Upper Confidence Bound (UCB) [35] with parameter β = 1.0, and Thompson Sampling (TS) [44]. For the GP hyperparameter optimization, we use 100 Adam [19] iterations with a learning rate of 0.01.