Scalable Thompson Sampling using Sparse Gaussian Process Models
Authors: Sattar Vakili, Henry Moss, Artem Artemev, Vincent Dutordoir, Victor Picheny
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our algorithm achieves near state-of-the-art regret performance on standard benchmarks through extensive numerical experiments. |
| Researcher Affiliation | Academia | University of Cambridge, Imperial College London, Vector Institute, University of Toronto, The Alan Turing Institute |
| Pseudocode | Yes | Algorithm 1: Scalable Sparse GP-TS for Bayesian Optimization |
| Open Source Code | Yes | Our code is available at https://github.com/lucas-schmied/sgpts |
| Open Datasets | Yes | We benchmark SGPT-TS on common synthetic Bayesian optimization functions as well as a range of practical black-box optimization tasks from DeepGP [27] and BOCS [28]. For real-world datasets, we use the datasets from DeepGP [27], which consist of 9 real-world tasks. The datasets are downloaded from the authors’ GitHub repository. |
| Dataset Splits | No | The paper focuses on Bayesian Optimization, which typically does not involve explicit train/validation/test splits of a static dataset for model training, but rather evaluates performance over BO iterations. No specific dataset splits (e.g., percentages or counts) are mentioned for the benchmark functions or real-world tasks. |
| Hardware Specification | Yes | All experiments were run on an NVIDIA GeForce RTX 2080 Ti GPU. |
| Software Dependencies | No | Our code is implemented in PyTorch [39] and uses the GPyTorch [12] library for Gaussian processes. For the sparse GP components, we use the methods from Spearmint [42] (which relies on GPFlow [29]) and GPy [4]. |
| Experiment Setup | Yes | All tasks are evaluated over 50 BO iterations across 10 random seeds. The maximum number of training points for our sparse GPs is set to 200, which leads to a batch size of 20 when using 10 random seeds per iteration. We follow [48] for initialization of the inducing points. We consider three acquisition functions: Expected Improvement (EI) [34], Upper Confidence Bound (UCB) [35] with parameter β = 1.0, and Thompson Sampling (TS) [44]. For the GP hyperparameter optimization, we use 100 Adam [19] iterations with a learning rate of 0.01. |