Finite-Time Frequentist Regret Bounds of Multi-Agent Thompson Sampling on Sparse Hypergraphs
Authors: Tianyuan Jin, Hao-Lun Hsu, William Chang, Pan Xu
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We prove that ϵ-MATS achieves a worst-case frequentist regret bound that is sublinear in both the time horizon and the local arm size. We also derive a lower bound for this setting, which implies our frequentist regret upper bound is optimal up to constant and logarithm terms, when the hypergraph is sufficiently sparse. Thorough experiments on standard MAMAB problems demonstrate the superior performance and the improved computational efficiency of ϵ-MATS compared with existing algorithms in the same setting. We further conduct extensive experiments on various MAMAB problems |
| Researcher Affiliation | Academia | Tianyuan Jin1, Hao-Lun Hsu2, William Chang3, Pan Xu2 1National University of Singapore 2Duke University 3University of California, Los Angeles tianyuan@u.nus.edu, hao-lun.hsu@duke.edu, chang314@g.ucla.edu, pan.xu@duke.edu |
| Pseudocode | Yes | In this section, we present the ϵ-exploring Multi-Agent Thompson Sampling Algorithm (ϵ-MATS), whose pseudocode of ϵ-MATSis displayed in Algorithm 1. |
| Open Source Code | No | The paper does not provide a specific statement or link indicating that the source code for the proposed ϵ-MATS algorithm is publicly available. |
| Open Datasets | Yes | In this section, we evaluate the proposed ϵ-MATS algorithm on several benchmark MAMAB problems including Bernoulli 0101-Chain, Poisson 0101-Chain, and Gem Mining (Roijers, Whiteson, and Oliehoek 2015; Bargiacchi et al. 2018; Verstraeten et al. 2020). |
| Dataset Splits | No | The paper describes experiments on multi-armed bandit problems, which involve online learning rather than traditional supervised learning dataset splits. It does not specify predefined training, validation, or test dataset splits with percentages or sample counts. |
| Hardware Specification | Yes | We run all our experiments on Nvidia RTX A5000 with 24GB RAM and each experiment setting is averaged over 50 trials. |
| Software Dependencies | No | The paper mentions the algorithms and problems studied but does not specify any software dependencies (e.g., programming languages, libraries, or frameworks) with their version numbers. |
| Experiment Setup | Yes | Here ϵ (0, 1] is a user-specified parameter that controls the level of exploration. In the rest of the experiments in this subsection, we fix ϵ = 0.1 for the best performance. We run all our experiments on Nvidia RTX A5000 with 24GB RAM and each experiment setting is averaged over 50 trials. Please refer to Appendix G for detailed experimental setup, ablation studies, and more experimental results. |