Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Coordinated Versus Decentralized Exploration In Multi-Agent Multi-Armed Bandits

Authors: Mithun Chakraborty, Kai Yee Phoebe Chua, Sanmay Das, Brendan Juba

IJCAI 2017 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We then introduce an algorithm for the decentralized setting that uses a value-of-information based communication strategy and an exploration-exploitation strategy based on the centralized algorithm, and show experimentally that it converges rapidly to the performance of the centralized method. (Abstract); In this section, we describe two sets of experiments we ran to compare the performance of the decentralized multi-agent MAB exploration-exploitation algorithm with Vo I communication strategy that we proposed in Section 5 with several benchmarks described below. (Section 6)
Researcher Affiliation	Academia	1Washington University in St. Louis 2University of California, Irvine
Pseudocode	Yes	Figure 1: Flow-chart showing the steps of the Vo I and simple communication strategies for an arbitrary agent j at the end of an action round, as described in Section 5.
Open Source Code	No	The paper does not contain any explicit statement about releasing open-source code for the methodology, nor does it provide a link to a code repository.
Open Datasets	No	The paper uses synthetic data generated based on specified parameters (Gaussian rewards, arithmetic sequence means) but does not provide access information (link, DOI, citation) to a public or open dataset.
Dataset Splits	No	The paper describes a simulation environment and experiments, but it does not specify training, validation, or test dataset splits (e.g., percentages or sample counts) as would be common for machine learning datasets.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used for conducting the experiments.
Software Dependencies	No	The paper does not specify any ancillary software dependencies (e.g., libraries, frameworks, solvers) along with their version numbers.
Experiment Setup	Yes	For each experiment, the number of agents is set at m = 25n, n being the number of arms The means of the Gaussian reward distributions on the bandit arms form a decreasing arithmetic sequence starting at µmax = µ1 = 1 and ending at µmin = µn = 0.05, so that the magnitude of the common difference is Θ( 1/n); the shared standard deviation σ = 0.1 is independent of the number of arms. (Section 6); Each data-point is generated by averaging the regret values over Nsim = 105 repetitions. We set δ = 0.01, εvoi = 0.05 Nsim to ensure that our conﬁdence bounds hold for all experiments. (Section 6)