Improved Bayes Regret Bounds for Multi-Task Hierarchical Bayesian Bandit Algorithms
Authors: Jiechao Guan, Hui Xiong
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we conduct experiments in the linear bandit setting to verify our theoretical results. Specifically, we show the influence of hyper-parameters (e.g. m, n, L) to the multi-task Bayes regret of Hier TS and Hier Bayes UCB, to validate the consistency between their regret bounds and practical performance. Besides, we compare the performance between our algorithms and other baselines, to show the effectiveness of hierarchical Bayesian bandit algorithms in the multi-task bandit setting. (...) Experimental Results. From Figure 1, we can observe that: (1) In plot (a), the multi-task regret becomes larger with the increase of m and n, which is consistent with our regret upper bound in Theorems 5.1. |
| Researcher Affiliation | Academia | Jiechao Guan1 Hui Xiong1,2, 1AI Thrust , The Hong Kong University of Science and Technology (Guangzhou), China 2Department of Computer Science and Engineering, HKUST, China {jiechaoguan, xionghui}@hkust-gz.edu.cn |
| Pseudocode | Yes | Algorithm 1 Hierarchical Bayesian Algorithms for Multi-Task Linear Bandit Setting (...) Algorithm 2 Hierarchical Bayesian Algorithms for Multi-Task Combinatorial Semi-Bandit Setting |
| Open Source Code | Yes | The source code for reproducing all experimental results of Hier TS and Hier Bayes UCB is provided in the supplementary material. |
| Open Datasets | No | The paper uses a synthetic problem setup, not a publicly available dataset with specific access information. 'The synthetic problem is defined as follows. In most experiments, we set the number of total tasks as m = 10, the dimension of action space as d = 4, the number of concurrent tasks as L = 5, the number of rounds as n = 200m/L. We focus on the finite action space with |A| = 10, and each action is sampled uniformly from [ 0.5, 0.5]d.' |
| Dataset Splits | No | The paper describes a synthetic experimental setting with parameters for simulations (e.g., 'number of rounds as n = 200m/L'), but it does not specify explicit training, validation, or test dataset splits in terms of percentages or sample counts for an external dataset. |
| Hardware Specification | Yes | We run all bandit algorithms on a platform with 8 NVIDIA RTX 6000 GPUs and 2 AMD EPYC 7543 Processors. Each GPU has 48G memory, and each CPU has 64 cores. |
| Software Dependencies | Yes | The CUDA version is 12.1, the Python version 3.7.16, the matplotlib version 3.5.3, and the tensorflow version 1.15. |
| Experiment Setup | Yes | In most experiments, we set the number of total tasks as m = 10, the dimension of action space as d = 4, the number of concurrent tasks as L = 5, the number of rounds as n = 200m/L. We focus on the finite action space with |A| = 10, and each action is sampled uniformly from [ 0.5, 0.5]d. In hierarchical Bayesian model, we set the hyper-prior as zero-mean isotropic Gaussian distribution N(µq, Σq) = N(0, Σq), where Σq = σ2 q Id; and set the task variance Σ0 = σ2 0Id. Unless otherwise stated, we set σq = 1, σ0 = 0.1, σ2 = 0.5 for each task in most experiments. |