Tight First- and Second-Order Regret Bounds for Adversarial Linear Bandits
Authors: Shinji Ito, Shuichi Hirahara, Tasuku Soma, Yuichi Yoshida
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We propose novel algorithms with firstand second-order regret bounds for adversarial linear bandits. These regret bounds imply that our algorithms perform well when there is an action achieving a small cumulative loss or the loss has a small variance. In addition, we need only assumptions weaker than those of existing algorithms; our algorithms work on discrete action sets as well as continuous ones without a priori knowledge about losses, and they run efficiently if a linear optimization oracle for the action set is available. These results are obtained by combining optimistic online optimization, continuous multiplicative weight update methods, and a novel technique that we refer to as distribution truncation. We also show that the regret bounds of our algorithms are tight up to polylogarithmic factors.In this paper, we present newly-devised, efficient algorithms with improved firstand second-order regret bounds for the adversarial linear bandit problem. Our algorithms not only yield better regret bounds but also make only fewer assumptions than have been seen in previous studies. |
| Researcher Affiliation | Collaboration | Shinji Ito NEC Corporation i-shinji@nec.com Shuichi Hirahara National Institute of Informatics s_hirahara@nii.ac.jp Tasuku Soma The University of Tokyo tasuku_soma@mist.i.u-tokyo.ac.jp Yuichi Yoshida National Institute of Informatics yyoshida@nii.ac.jp |
| Pseudocode | Yes | Algorithm 1 An algorithm for adversarial linear bandits with predicted loss 1: for t = 1, 2, . . . , T do 2: repeat 3: Pick xt from the distribution pt, defined by (5). 4: until xt 2 S(pt) 1 dγ2 t 5: Choose at A so that E[at] = xt, play at, and receive a loss ℓt, at as feedback. 6: Compute an unbiased estimator ˆℓt of ℓt as ˆℓt = mt + ℓt mt, at S( pt) 1xt. 7: Update pt as in (5). 8: end for |
| Open Source Code | No | The paper does not provide any specific repository link, explicit code release statement, or mention of code in supplementary materials for the described methodology. |
| Open Datasets | No | The paper is a theoretical work focusing on algorithms and regret bounds for adversarial linear bandits, and it does not use or refer to any specific datasets for training or evaluation. |
| Dataset Splits | No | The paper is a theoretical work and does not involve empirical evaluation with datasets, thus no dataset split information is provided. |
| Hardware Specification | No | The paper is a theoretical work on algorithms and regret bounds, and thus does not describe any specific hardware used for running experiments. |
| Software Dependencies | No | The paper is theoretical and focuses on algorithm design and mathematical proofs; it does not mention specific software dependencies or versions required to replicate any experiments. |
| Experiment Setup | No | The paper is theoretical and describes algorithms and their regret bounds; it does not include details on an experimental setup, hyperparameters, or training configurations. |