Communication-Efficient Collaborative Regret Minimization in Multi-Armed Bandits
Authors: Nikolai Karpov, Qin Zhang
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this paper, we study the collaborative learning model, which concerns the tradeoff between parallelism and communication overhead in multi-agent multi-armed bandits. For regret minimization in multi-armed bandits, we present the first set of tradeoffs between the number of rounds of communication among the agents and the regret of the collaborative learning process. Our main result is a lower bound for MAB in the CL model (Theorem 2). We show that for any T-time K-agent collaborative algorithm AK, there is an input I such that if AK runs on I using at most R log(KT ) 2 log log log(KT ) rounds, then AK incurs an expected re-gret of Ω min{K, (KT) 1 R } 1 (I) . We also design an algorithm for batched MAB (Theorem 17). Again via Observation 1, we obtain an algorithm for MAB in the CL model. Our upper bound matches the lower bound up to logarithmic factors in regret. |
| Researcher Affiliation | Academia | Nikolai Karpov, Qin Zhang Indiana University Bloomington, IN 47405, USA nkarpov@iu.edu, qzhangcs@indiana.edu |
| Pseudocode | Yes | Algorithm 1: BATCHEDMAB(I, λ, T) |
| Open Source Code | No | The paper is theoretical and focuses on algorithm design and theoretical bounds. It does not provide any statement about releasing open-source code for the described methodology or a link to a code repository. |
| Open Datasets | No | The paper is theoretical and defines problem setups conceptually (e.g., 'Bernoulli arms') rather than using specific, publicly available datasets for empirical training. No dataset access information is provided. |
| Dataset Splits | No | The paper is theoretical and does not describe empirical experiments or data splits for training, validation, or testing. |
| Hardware Specification | No | The paper is theoretical and does not describe any specific hardware specifications used for running experiments. |
| Software Dependencies | No | The paper is theoretical and focuses on algorithm design and theoretical proofs. It does not list any specific software dependencies with version numbers. |
| Experiment Setup | No | The paper describes a theoretical algorithm (Algorithm 1) but does not provide specific experimental setup details such as hyperparameter values, training configurations, or system-level settings, as it does not report empirical experiments. |