On Interpolating Experts and Multi-Armed Bandits

Authors: Houshuang Chen, Yuchen He, Chihao Zhang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We prove tight minimax regret bounds for m-MAB and design an optimal PAC algorithm for its pure exploration version, m BAI, where the goal is to identify the arm with minimum loss with as few rounds as possible. We show that the minimax regret of m-MAB is T PK k=1 log(mk + 1) and the minimum number of pulls for an (ε, 0.05)-PAC algorithm of m-BAI is 1 ε2 PK k=1 log(mk + 1) . Both our upper bounds and lower bounds for m-MAB can be extended to a more general setting, namely the bandit with graph feedback, in terms of the clique cover and related graph parameters. As consequences, we obtained tight minimax regret bounds for several families of feedback graphs.
Researcher Affiliation Academia Houshuang Chen 1 Yuchen He 1 Chihao Zhang 2 1Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China 2John Hopcroft Center for Computer Science, Shanghai Jiao Tong University, Shanghai, China.
Pseudocode Yes Algorithm 1 Two-Stage Algorithm for m-MAB
Open Source Code No The paper does not include any explicit statement or link indicating that its source code is available.
Open Datasets No This is a theoretical paper focusing on mathematical proofs and algorithm design, not empirical evaluation on datasets. It refers to theoretical constructs like 'Ber(1/2) distribution' and 'Gaussian distribution N(0, sigma^2)' for proofs, but these are not publicly available datasets in the context of experiments.
Dataset Splits No This is a theoretical paper and does not involve empirical experiments with data splits for training, validation, or testing.
Hardware Specification No As a theoretical paper, no hardware specifications for running experiments are mentioned.
Software Dependencies No As a theoretical paper, no software dependencies with specific version numbers are provided for reproducibility of experiments.
Experiment Setup No As a theoretical paper, there are no experimental setup details like hyperparameters or training configurations mentioned.