When Combinatorial Thompson Sampling meets Approximation Regret
Authors: Pierre Perrault
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We study the Combinatorial Thompson Sampling policy (CTS) for combinatorial multi-armed bandit problems (CMAB), within an approximation regret setting. Although CTS has attracted a lot of interest, it has a drawback that other usual CMAB policies do not have when considering non-exact oracles: for some oracles, CTS has a poor approximation regret (scaling linearly with the time horizon T) [Wang and Chen, 2018]. A study is then necessary to discriminate the oracles on which CTS could learn. This study was started by Kong et al. [2021]: they gave the first approximation regret analysis of CTS for the greedy oracle, obtaining an upper bound of order O log(T)/ 2 , where is some minimal reward gap. In this paper, our objective is to push this study further than the simple case of the greedy oracle. We provide the first O(log(T)/ ) approximation regret upper bound for CTS, obtained under a specific condition on the approximation oracle, allowing a reduction to the exact oracle analysis. We thus term this condition REDUCE2EXACT, and observe that it is satisfied in many concrete examples. Moreover, it can be extended to the probabilistically triggered arms setting, thus capturing even more problems, such as online influence maximization. |
| Researcher Affiliation | Industry | Pierre Perrault Idemia pierre.perrault@idemia.com |
| Pseudocode | Yes | Algorithm 1 CTS-BETA; Algorithm 2 CTS-GAUSSIAN |
| Open Source Code | No | The paper is theoretical and focuses on analysis and algorithms. It does not mention releasing source code for the methodology. The ethics statement indicates 'N/A' for questions related to code and experiments. |
| Open Datasets | No | The paper is theoretical and does not conduct empirical experiments with datasets. Therefore, it does not discuss public dataset availability for training. The ethics statement explicitly marks as 'N/A' questions about training details. |
| Dataset Splits | No | The paper is theoretical and does not conduct empirical experiments with datasets. Therefore, it does not provide dataset split information for validation. The ethics statement explicitly marks as 'N/A' questions about training details. |
| Hardware Specification | No | The paper is theoretical and does not involve running empirical experiments; therefore, it does not specify hardware used. The ethics statement explicitly marks as 'N/A' questions about compute resources. |
| Software Dependencies | No | The paper is theoretical and does not involve running empirical experiments requiring specific software versions for reproducibility. The ethics statement explicitly marks as 'N/A' questions about code and instructions for reproducing results. |
| Experiment Setup | No | The paper is theoretical and does not involve an empirical experimental setup with hyperparameters or training settings. The ethics statement explicitly marks as 'N/A' questions about training details. |