Communication-Efficient Collaborative Regret Minimization in Multi-Armed Bandits

Authors: Nikolai Karpov, Qin Zhang

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this paper, we study the collaborative learning model, which concerns the tradeoff between parallelism and communication overhead in multi-agent multi-armed bandits. For regret minimization in multi-armed bandits, we present the first set of tradeoffs between the number of rounds of communication among the agents and the regret of the collaborative learning process. Our main result is a lower bound for MAB in the CL model (Theorem 2). We show that for any T-time K-agent collaborative algorithm AK, there is an input I such that if AK runs on I using at most R log(KT ) 2 log log log(KT ) rounds, then AK incurs an expected re-gret of Ω min{K, (KT) 1 R } 1 (I) . We also design an algorithm for batched MAB (Theorem 17). Again via Observation 1, we obtain an algorithm for MAB in the CL model. Our upper bound matches the lower bound up to logarithmic factors in regret.
Researcher Affiliation Academia Nikolai Karpov, Qin Zhang Indiana University Bloomington, IN 47405, USA nkarpov@iu.edu, qzhangcs@indiana.edu
Pseudocode Yes Algorithm 1: BATCHEDMAB(I, λ, T)
Open Source Code No The paper is theoretical and focuses on algorithm design and theoretical bounds. It does not provide any statement about releasing open-source code for the described methodology or a link to a code repository.
Open Datasets No The paper is theoretical and defines problem setups conceptually (e.g., 'Bernoulli arms') rather than using specific, publicly available datasets for empirical training. No dataset access information is provided.
Dataset Splits No The paper is theoretical and does not describe empirical experiments or data splits for training, validation, or testing.
Hardware Specification No The paper is theoretical and does not describe any specific hardware specifications used for running experiments.
Software Dependencies No The paper is theoretical and focuses on algorithm design and theoretical proofs. It does not list any specific software dependencies with version numbers.
Experiment Setup No The paper describes a theoretical algorithm (Algorithm 1) but does not provide specific experimental setup details such as hyperparameter values, training configurations, or system-level settings, as it does not report empirical experiments.