Distributed Linear Bandits under Communication Constraints
Authors: Sudeep Salgia, Qing Zhao
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we provide empirical evidence that corroborates our theoretical findings. We compare our proposed PLS algorithm with three popular distributed linear bandit algorithms, namely, Distributed Elimination for Linear Bandits (DELB) (Wang et al., 2019), Federated Phased Elimination (Fed-PE) (Huang et al., 2021) and Distributed Batch Elimination Linear Upper Confidence Bound (Dis BE-LUCB) (Amani et al., 2022). |
| Researcher Affiliation | Academia | Sudeep Salgia 1 Qing Zhao 1 1Department of Electrical and Computer Engineering, Cornell University, Ithaca, NY, USA. Correspondence to: Sudeep Salgia <ss3827@cornell.edu>. |
| Pseudocode | Yes | The pseudo code for the norm estimation stage is given in Algorithms 1 and 2. |
| Open Source Code | No | The information is insufficient. The paper does not explicitly state that source code for its methodology is available or provide a link to a repository. |
| Open Datasets | No | The information is insufficient. The paper describes synthetic data generation for its experiments but does not provide access information or citations for any publicly available or open dataset. |
| Dataset Splits | No | The information is insufficient. The paper conducts simulations for a bandit problem and does not describe training, validation, or test dataset splits in the conventional sense. It mentions a |
| Hardware Specification | No | The information is insufficient. The paper describes the experimental setup and parameters but does not provide any specific hardware details (e.g., GPU/CPU models, memory, cloud instances) used for running the simulations. |
| Software Dependencies | No | The information is insufficient. The paper does not specify any software dependencies, libraries, or their version numbers required for reproducibility of the experiments. |
| Experiment Setup | Yes | We consider a distributed linear bandit instance with d = 20, M = 10 agents which is run for a time horizon of T = 10^6 steps. The underlying mean reward vector is drawn uniformly from the surface of a unit ball. The rewards are corrupted with a zero mean Gaussian with unit variance. |