Distributed Linear Bandits under Communication Constraints

Authors: Sudeep Salgia, Qing Zhao

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we provide empirical evidence that corroborates our theoretical findings. We compare our proposed PLS algorithm with three popular distributed linear bandit algorithms, namely, Distributed Elimination for Linear Bandits (DELB) (Wang et al., 2019), Federated Phased Elimination (Fed-PE) (Huang et al., 2021) and Distributed Batch Elimination Linear Upper Confidence Bound (Dis BE-LUCB) (Amani et al., 2022).
Researcher Affiliation Academia Sudeep Salgia 1 Qing Zhao 1 1Department of Electrical and Computer Engineering, Cornell University, Ithaca, NY, USA. Correspondence to: Sudeep Salgia <ss3827@cornell.edu>.
Pseudocode Yes The pseudo code for the norm estimation stage is given in Algorithms 1 and 2.
Open Source Code No The information is insufficient. The paper does not explicitly state that source code for its methodology is available or provide a link to a repository.
Open Datasets No The information is insufficient. The paper describes synthetic data generation for its experiments but does not provide access information or citations for any publicly available or open dataset.
Dataset Splits No The information is insufficient. The paper conducts simulations for a bandit problem and does not describe training, validation, or test dataset splits in the conventional sense. It mentions a
Hardware Specification No The information is insufficient. The paper describes the experimental setup and parameters but does not provide any specific hardware details (e.g., GPU/CPU models, memory, cloud instances) used for running the simulations.
Software Dependencies No The information is insufficient. The paper does not specify any software dependencies, libraries, or their version numbers required for reproducibility of the experiments.
Experiment Setup Yes We consider a distributed linear bandit instance with d = 20, M = 10 agents which is run for a time horizon of T = 10^6 steps. The underlying mean reward vector is drawn uniformly from the surface of a unit ball. The rewards are corrupted with a zero mean Gaussian with unit variance.