Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Distributed Multi-Agent Bandits Over Erdős-Rényi Random Networks

Authors: Jingyuan Liu, Hao Qiu, Lin F. Yang, Mengfan Xu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we demonstrate the effectiveness of our algorithm through numerical experiments on both synthetic and real-world datasets.3 The objective is twofold. First, we show that the cumulative regret of our algorithm grows logarithmically with respect to T and is significantly smaller than that of existing benchmarks, thereby validating our theoretical findings. We use Dr Fed-UCB, proposed by [Xu and Klabjan, 2023], as the baseline. Second, we conduct a simulation study to examine how the regret depends on the link probability p and the algebraic connectivity of the base graph G, as reflected in the regret bound.
Researcher Affiliation Academia Jingyuan Liu Nanjing University EMAIL Hao Qiu Università degli Studi di Milano EMAIL Lin Yang Nanjing University EMAIL Mengfan Xu University of Massachusetts Amherst EMAIL
Pseudocode Yes Algorithm 1 Gossip Successive Elimination for Agent i [N]
Open Source Code Yes 3The code for the experiments is available at https://github.com/haoqiu95/multi-agent-bandit.
Open Datasets Yes For real-world experiments, we use the Movie Lens dataset and refer to Yi and Vojnovic [2023] for details.
Dataset Splits No For synthetic experiment setting, We set T = 10000, N = 16, and K = 5; for the Petersen graph, we use N = 10 by definition. For the comparison with Dr Fed-UCB, we consider a complete graph and a high link probability (p = 0.9), as required therein. Before the game starts, we sample each qi independently and uniformly from the interval [0, 1] for each agent i. The local mean reward of arm k on agent i is given by µi,k = qi k 1 K 1, and the global mean reward of arm k is µk = k 1 N . At each time step t, each agent i [N] selects an arm and observes the local reward. For real-world experiments, we use the Movie Lens dataset and refer to Yi and Vojnovic [2023] for details. We set the horizon T = 10,000, and select 20 users as agents (N = 20) and 5 genres as arms (K = 5). At each time step t, each agent randomly selects a movie from the genres. All ratings (rewards) of movies are normalised to [0, 1].
Hardware Specification No Our experiments can be run just on laptop, and thus our paper do not include any further information on computer resources.
Software Dependencies No The paper does not specify any software dependencies with version numbers used for the experiments.
Experiment Setup Yes Experimental Settings. For synthetic experiment setting, We set T = 10000, N = 16, and K = 5; for the Petersen graph, we use N = 10 by definition. For the comparison with Dr Fed-UCB, we consider a complete graph and a high link probability (p = 0.9), as required therein. Before the game starts, we sample each qi independently and uniformly from the interval [0, 1] for each agent i. The local mean reward of arm k on agent i is given by µi,k = qi k 1 K 1, and the global mean reward of arm k is µk = k 1 N . At each time step t, each agent i [N] selects an arm and observes the local reward. For real-world experiments, we use the Movie Lens dataset and refer to Yi and Vojnovic [2023] for details. We set the horizon T = 10,000, and select 20 users as agents (N = 20) and 5 genres as arms (K = 5). At each time step t, each agent randomly selects a movie from the genres. All ratings (rewards) of movies are normalised to [0, 1].