Hybrid Regret Bounds for Combinatorial Semi-Bandits and Adversarial Linear Bandits
Authors: Shinji Ito
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | This study aims to develop bandit algorithms that automatically exploit tendencies of certain environments to improve performance, without any prior knowledge regarding the environments. We first propose an algorithm for combinatorial semi-bandits with a hybrid regret bound that includes two main features: a bestof-three-worlds guarantee and multiple data-dependent regret bounds. The former means that the algorithm will work nearly optimally in all environments in an adversarial setting, a stochastic setting, or a stochastic setting with adversarial corruptions. The latter implies that, even if the environment is far from exhibiting stochastic behavior, the algorithm will perform better as long as the environment is "easy" in terms of certain metrics. We also show hybrid data-dependent regret bounds for adversarial linear bandits, which include a first path-length regret bound that is tight up to logarithmic factors. |
| Researcher Affiliation | Industry | Shinji Ito NEC Corporation i-shinji@nec.com |
| Pseudocode | Yes | Algorithm 1 Hybrid algorithm for combinatorial semi-bandits. Require: Action set A, time horizon T N. 1: Initialize mt [0, 1]d by m1i = 1/2 for all i [d]. 2: for t = 1, 2, . . . , T do 3: Compute xt as (2), where ˆℓj, ψt and βti are defined in (3) and (4), respectively. 4: Pick at so that E[at|xt] = xt, output at, and get feedback of ℓti for each i such that ati = 1. 5: Compute ˆℓt, βt+1,i and mt+1 on the basis of (3), (4) and (5), respectively. 6: end for |
| Open Source Code | No | The paper does not provide any statement about releasing source code or a link to a code repository for the described methodology. |
| Open Datasets | No | The paper is theoretical and focuses on algorithm design and regret bounds. It does not describe experiments using specific datasets, nor does it provide information about public availability of any datasets for training. |
| Dataset Splits | No | The paper is theoretical and does not describe any experimental validation using data splits. Thus, no information on training/validation/test splits is provided. |
| Hardware Specification | No | The paper is theoretical and does not mention any specific hardware used for running experiments or computations. |
| Software Dependencies | No | The paper does not specify any software names with version numbers (e.g., programming languages, libraries, or solvers) that would be needed to reproduce the work. |
| Experiment Setup | No | The paper is theoretical and details the design of algorithms and their mathematical properties. It does not include an experimental setup section with concrete hyperparameters, training configurations, or system-level settings for empirical evaluation. |