Improved Regret Bounds for Bandit Combinatorial Optimization
Authors: Shinji Ito, Daisuke Hatano, Hanna Sumita, Kei Takemura, Takuro Fukunaga, Naonori Kakimura, Ken-Ichi Kawarabayashi
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | Our main contribution is that we managed to improve this bound by Ω(dk3T) through applying a factor of log T, which can be done by means of strongly-correlated losses with binary values. The bound derives better regret bounds for three specific examples of the bandit combinatorial optimization: the multitask bandit, the bandit ranking and the multiple-play bandit. In particular, the bound obtained for the bandit ranking in the present study addresses an open problem raised in [8]. In addition, we demonstrate that the problem becomes easier without considering correlations among entries of loss vectors. In fact, if each entry of loss vectors is an independent random variable, then, one can achieve a regret of O(dk2T), which is k times smaller than the lower bound shown above. The observed results indicated that correlation among losses is the reason for observing a large regret. |
| Researcher Affiliation | Collaboration | Shinji Ito NEC Corporation, The University of Tokyo i-shinji@nec.com Daisuke Hatano RIKEN AIP daisuke.hatano@riken.jp Hanna Sumita Tokyo Metropolitan University sumita@tmu.ac.jp Kei Takemura NEC Corporation kei_takemura@nec.com Takuro Fukunaga Chuo University, RIKEN AIP, JST PRESTO fukunaga.07s@g.chuo-u.ac.jp Naonori Kakimura Keio University kakimura@math.keio.ac.jp Ken-ichi Kawarabayashi National Institute of Informatics k-keniti@nii.ac.jp |
| Pseudocode | Yes | The differences between our Algorithm 1 given in Appendix D.1 and Algorithm 12 in [19] are summarized as follows: |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating that source code for their methodology is made publicly available. |
| Open Datasets | No | This paper is theoretical and focuses on proving regret bounds and analyzing probabilistic distributions. It does not describe experiments that would involve using publicly available datasets for training, validation, or testing. |
| Dataset Splits | No | This paper is theoretical and focuses on proving regret bounds and analyzing probabilistic distributions. It does not describe experiments that would involve specific dataset splits for training, validation, or testing. |
| Hardware Specification | No | The paper is theoretical and does not describe any experiments that would require specific hardware for execution. Therefore, no hardware specifications are mentioned. |
| Software Dependencies | No | The paper is theoretical and does not describe any experiments that would require specific software dependencies with version numbers for reproducibility. |
| Experiment Setup | No | The paper is theoretical, focusing on mathematical proofs and analysis of regret bounds. It does not describe an experimental setup with specific hyperparameters or system-level training settings. |