reproducibilityindex.ai

Regret Bounds for Batched Bandits

Authors: Hossein Esfandiari, Amin Karbasi, Abbas Mehrabian, Vahab Mirrokni7340-7348

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We prove bounds for their expected regrets that improve and extend the best known regret bounds of Gao, Han, Ren, and Zhou (NeurIPS 2019), for any number of batches. In this paper, we study the problem of batch policies in the context of multi-armed and linear bandits with the goal of minimizing regret, the standard benchmark for comparing performance of bandit policies. We advance the theoretical understanding of these problems by designing algorithms along with hardness results.
Researcher Affiliation	Collaboration	Hossein Esfandiari,1 Amin Karbasi,2 Abbas Mehrabian,3 Vahab Mirrokni1 1Google Research, New York City, New York, USA 2School of Engineering and Applied Science, Yale University, New Haven, Connecticut, USA 3Mc Gill University, Montr eal, Quebec, Canada
Pseudocode	Yes	Algorithm 1 Batched arm elimination for stochastic multiarmed bandits
Open Source Code	No	The paper does not provide any statements or links indicating that source code for the described methodology is publicly available.
Open Datasets	No	The paper is theoretical and focuses on algorithm design and regret bounds for bandit problems, thus it does not use or provide information about specific datasets for training.
Dataset Splits	No	As a theoretical paper, it does not describe experimental validation on data splits.
Hardware Specification	No	The paper is theoretical and does not discuss hardware specifications for running experiments.
Software Dependencies	No	The paper is theoretical and does not specify any software dependencies with version numbers.
Experiment Setup	No	The paper is theoretical, focusing on algorithm design and analysis rather than empirical experiments, and therefore does not provide details on experimental setup or hyperparameters.