reproducibilityindex.ai

Hybrid Regret Bounds for Combinatorial Semi-Bandits and Adversarial Linear Bandits

Authors: Shinji Ito

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	This study aims to develop bandit algorithms that automatically exploit tendencies of certain environments to improve performance, without any prior knowledge regarding the environments. We ﬁrst propose an algorithm for combinatorial semi-bandits with a hybrid regret bound that includes two main features: a bestof-three-worlds guarantee and multiple data-dependent regret bounds. The former means that the algorithm will work nearly optimally in all environments in an adversarial setting, a stochastic setting, or a stochastic setting with adversarial corruptions. The latter implies that, even if the environment is far from exhibiting stochastic behavior, the algorithm will perform better as long as the environment is "easy" in terms of certain metrics. We also show hybrid data-dependent regret bounds for adversarial linear bandits, which include a ﬁrst path-length regret bound that is tight up to logarithmic factors.
Researcher Affiliation	Industry	Shinji Ito NEC Corporation i-shinji@nec.com
Pseudocode	Yes	Algorithm 1 Hybrid algorithm for combinatorial semi-bandits. Require: Action set A, time horizon T N. 1: Initialize mt [0, 1]d by m1i = 1/2 for all i [d]. 2: for t = 1, 2, . . . , T do 3: Compute xt as (2), where ˆℓj, ψt and βti are deﬁned in (3) and (4), respectively. 4: Pick at so that E[at\|xt] = xt, output at, and get feedback of ℓti for each i such that ati = 1. 5: Compute ˆℓt, βt+1,i and mt+1 on the basis of (3), (4) and (5), respectively. 6: end for
Open Source Code	No	The paper does not provide any statement about releasing source code or a link to a code repository for the described methodology.
Open Datasets	No	The paper is theoretical and focuses on algorithm design and regret bounds. It does not describe experiments using specific datasets, nor does it provide information about public availability of any datasets for training.
Dataset Splits	No	The paper is theoretical and does not describe any experimental validation using data splits. Thus, no information on training/validation/test splits is provided.
Hardware Specification	No	The paper is theoretical and does not mention any specific hardware used for running experiments or computations.
Software Dependencies	No	The paper does not specify any software names with version numbers (e.g., programming languages, libraries, or solvers) that would be needed to reproduce the work.
Experiment Setup	No	The paper is theoretical and details the design of algorithms and their mathematical properties. It does not include an experimental setup section with concrete hyperparameters, training configurations, or system-level settings for empirical evaluation.