Regret Bounds for Batched Bandits

Authors: Hossein Esfandiari, Amin Karbasi, Abbas Mehrabian, Vahab Mirrokni7340-7348

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We prove bounds for their expected regrets that improve and extend the best known regret bounds of Gao, Han, Ren, and Zhou (NeurIPS 2019), for any number of batches. In this paper, we study the problem of batch policies in the context of multi-armed and linear bandits with the goal of minimizing regret, the standard benchmark for comparing performance of bandit policies. We advance the theoretical understanding of these problems by designing algorithms along with hardness results.
Researcher Affiliation Collaboration Hossein Esfandiari,1 Amin Karbasi,2 Abbas Mehrabian,3 Vahab Mirrokni1 1Google Research, New York City, New York, USA 2School of Engineering and Applied Science, Yale University, New Haven, Connecticut, USA 3Mc Gill University, Montr eal, Quebec, Canada
Pseudocode Yes Algorithm 1 Batched arm elimination for stochastic multiarmed bandits
Open Source Code No The paper does not provide any statements or links indicating that source code for the described methodology is publicly available.
Open Datasets No The paper is theoretical and focuses on algorithm design and regret bounds for bandit problems, thus it does not use or provide information about specific datasets for training.
Dataset Splits No As a theoretical paper, it does not describe experimental validation on data splits.
Hardware Specification No The paper is theoretical and does not discuss hardware specifications for running experiments.
Software Dependencies No The paper is theoretical and does not specify any software dependencies with version numbers.
Experiment Setup No The paper is theoretical, focusing on algorithm design and analysis rather than empirical experiments, and therefore does not provide details on experimental setup or hyperparameters.