Tight First- and Second-Order Regret Bounds for Adversarial Linear Bandits

Authors: Shinji Ito, Shuichi Hirahara, Tasuku Soma, Yuichi Yoshida

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We propose novel algorithms with firstand second-order regret bounds for adversarial linear bandits. These regret bounds imply that our algorithms perform well when there is an action achieving a small cumulative loss or the loss has a small variance. In addition, we need only assumptions weaker than those of existing algorithms; our algorithms work on discrete action sets as well as continuous ones without a priori knowledge about losses, and they run efficiently if a linear optimization oracle for the action set is available. These results are obtained by combining optimistic online optimization, continuous multiplicative weight update methods, and a novel technique that we refer to as distribution truncation. We also show that the regret bounds of our algorithms are tight up to polylogarithmic factors.In this paper, we present newly-devised, efficient algorithms with improved firstand second-order regret bounds for the adversarial linear bandit problem. Our algorithms not only yield better regret bounds but also make only fewer assumptions than have been seen in previous studies.
Researcher Affiliation Collaboration Shinji Ito NEC Corporation i-shinji@nec.com Shuichi Hirahara National Institute of Informatics s_hirahara@nii.ac.jp Tasuku Soma The University of Tokyo tasuku_soma@mist.i.u-tokyo.ac.jp Yuichi Yoshida National Institute of Informatics yyoshida@nii.ac.jp
Pseudocode Yes Algorithm 1 An algorithm for adversarial linear bandits with predicted loss 1: for t = 1, 2, . . . , T do 2: repeat 3: Pick xt from the distribution pt, defined by (5). 4: until xt 2 S(pt) 1 dγ2 t 5: Choose at A so that E[at] = xt, play at, and receive a loss ℓt, at as feedback. 6: Compute an unbiased estimator ˆℓt of ℓt as ˆℓt = mt + ℓt mt, at S( pt) 1xt. 7: Update pt as in (5). 8: end for
Open Source Code No The paper does not provide any specific repository link, explicit code release statement, or mention of code in supplementary materials for the described methodology.
Open Datasets No The paper is a theoretical work focusing on algorithms and regret bounds for adversarial linear bandits, and it does not use or refer to any specific datasets for training or evaluation.
Dataset Splits No The paper is a theoretical work and does not involve empirical evaluation with datasets, thus no dataset split information is provided.
Hardware Specification No The paper is a theoretical work on algorithms and regret bounds, and thus does not describe any specific hardware used for running experiments.
Software Dependencies No The paper is theoretical and focuses on algorithm design and mathematical proofs; it does not mention specific software dependencies or versions required to replicate any experiments.
Experiment Setup No The paper is theoretical and describes algorithms and their regret bounds; it does not include details on an experimental setup, hyperparameters, or training configurations.