Coordinated Versus Decentralized Exploration In Multi-Agent Multi-Armed Bandits
Authors: Mithun Chakraborty, Kai Yee Phoebe Chua, Sanmay Das, Brendan Juba
IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We then introduce an algorithm for the decentralized setting that uses a value-of-information based communication strategy and an exploration-exploitation strategy based on the centralized algorithm, and show experimentally that it converges rapidly to the performance of the centralized method. (Abstract); In this section, we describe two sets of experiments we ran to compare the performance of the decentralized multi-agent MAB exploration-exploitation algorithm with Vo I communication strategy that we proposed in Section 5 with several benchmarks described below. (Section 6) |
| Researcher Affiliation | Academia | 1Washington University in St. Louis 2University of California, Irvine |
| Pseudocode | Yes | Figure 1: Flow-chart showing the steps of the Vo I and simple communication strategies for an arbitrary agent j at the end of an action round, as described in Section 5. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing open-source code for the methodology, nor does it provide a link to a code repository. |
| Open Datasets | No | The paper uses synthetic data generated based on specified parameters (Gaussian rewards, arithmetic sequence means) but does not provide access information (link, DOI, citation) to a public or open dataset. |
| Dataset Splits | No | The paper describes a simulation environment and experiments, but it does not specify training, validation, or test dataset splits (e.g., percentages or sample counts) as would be common for machine learning datasets. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used for conducting the experiments. |
| Software Dependencies | No | The paper does not specify any ancillary software dependencies (e.g., libraries, frameworks, solvers) along with their version numbers. |
| Experiment Setup | Yes | For each experiment, the number of agents is set at m = 25n, n being the number of arms The means of the Gaussian reward distributions on the bandit arms form a decreasing arithmetic sequence starting at µmax = µ1 = 1 and ending at µmin = µn = 0.05, so that the magnitude of the common difference is Θ( 1/n); the shared standard deviation σ = 0.1 is independent of the number of arms. (Section 6); Each data-point is generated by averaging the regret values over Nsim = 105 repetitions. We set δ = 0.01, εvoi = 0.05 Nsim to ensure that our confidence bounds hold for all experiments. (Section 6) |