Hybrid Regret Bounds for Combinatorial Semi-Bandits and Adversarial Linear Bandits

Authors: Shinji Ito

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical This study aims to develop bandit algorithms that automatically exploit tendencies of certain environments to improve performance, without any prior knowledge regarding the environments. We first propose an algorithm for combinatorial semi-bandits with a hybrid regret bound that includes two main features: a bestof-three-worlds guarantee and multiple data-dependent regret bounds. The former means that the algorithm will work nearly optimally in all environments in an adversarial setting, a stochastic setting, or a stochastic setting with adversarial corruptions. The latter implies that, even if the environment is far from exhibiting stochastic behavior, the algorithm will perform better as long as the environment is "easy" in terms of certain metrics. We also show hybrid data-dependent regret bounds for adversarial linear bandits, which include a first path-length regret bound that is tight up to logarithmic factors.
Researcher Affiliation Industry Shinji Ito NEC Corporation i-shinji@nec.com
Pseudocode Yes Algorithm 1 Hybrid algorithm for combinatorial semi-bandits. Require: Action set A, time horizon T N. 1: Initialize mt [0, 1]d by m1i = 1/2 for all i [d]. 2: for t = 1, 2, . . . , T do 3: Compute xt as (2), where ˆℓj, ψt and βti are defined in (3) and (4), respectively. 4: Pick at so that E[at|xt] = xt, output at, and get feedback of ℓti for each i such that ati = 1. 5: Compute ˆℓt, βt+1,i and mt+1 on the basis of (3), (4) and (5), respectively. 6: end for
Open Source Code No The paper does not provide any statement about releasing source code or a link to a code repository for the described methodology.
Open Datasets No The paper is theoretical and focuses on algorithm design and regret bounds. It does not describe experiments using specific datasets, nor does it provide information about public availability of any datasets for training.
Dataset Splits No The paper is theoretical and does not describe any experimental validation using data splits. Thus, no information on training/validation/test splits is provided.
Hardware Specification No The paper is theoretical and does not mention any specific hardware used for running experiments or computations.
Software Dependencies No The paper does not specify any software names with version numbers (e.g., programming languages, libraries, or solvers) that would be needed to reproduce the work.
Experiment Setup No The paper is theoretical and details the design of algorithms and their mathematical properties. It does not include an experimental setup section with concrete hyperparameters, training configurations, or system-level settings for empirical evaluation.