Lenient Regret and Good-Action Identification in Gaussian Process Bandits
Authors: Xu Cai, Selwyn Gomes, Jonathan Scarlett
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On the theoretical side, we study various lenient regret notions in which all near-optimal actions incur zero penalty, and provide upper bounds on the lenient regret for GP-UCB and an elimination algorithm, circumventing the usual O(T) term (with time horizon T) resulting from zooming extremely close towards the function maximum. In addition, we complement these upper bounds with algorithmindependent lower bounds. On the practical side, we consider the problem of finding a single good action according to a known pre-specified threshold, and introduce several good-action identification algorithms that exploit knowledge of the threshold. We experimentally find that such algorithms can often find a good action faster than standard optimization-based approaches. |
| Researcher Affiliation | Academia | 1Department of Computer Science, National University of Singapore 2Department of Mathematics & Institute of Data Science, National University of Singapore. |
| Pseudocode | No | The paper describes algorithms in prose (e.g., steps for the Elimination Algorithm in Section 2.4) but does not provide structured pseudocode or an explicitly labeled algorithm block. |
| Open Source Code | Yes | The code can be found at https://github.com/caitree/GoodAction. |
| Open Datasets | Yes | We consider a variety of widely-used synthetic functions whose descriptions can be found at (Bingham, 2021)." and "We consider tuning a regression task using XGBoost (Chen & Guestrin, 2016) on the wellknown Boston housing dataset. |
| Dataset Splits | Yes | We perform 3-fold cross-validation, using a fixed seed in order to provide deterministic behavior. |
| Hardware Specification | No | The paper does not specify the exact hardware (e.g., GPU/CPU models, RAM) used for running the experiments. |
| Software Dependencies | No | The paper mentions “Sci Py optimizer” and “XGBoost” but does not provide specific version numbers for these or other key software components used in the experiments. |
| Experiment Setup | Yes | In this experiment (but not later ones), we consider the case of fixed and known kernel hyperparameters, since our theory assumes this. Since the theoretical choice of βt is known to be overly conservative (Srinivas et al., 2010), we manually set β1/2 t = p log(2t)3 in both algorithms. ... For GP-UCB, we set β1/2 t = log t, which we found to provide a suitable exploration/exploitation trade-off. The hyperparameters are updated every 3 iterations by optimizing the loglikelihood (Rasmussen, 2006) within the range l [10 3, 1] and σSE [5 10 2, 1.5] using the built-in Sci Py optimizer based on L-BFGS-B. ... We optimize the acquisition functions using the built-in Sci Py optimizer with 10 random restarts. |