Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Distributed Multi-Agent Bandits Over Erdős-Rényi Random Networks
Authors: Jingyuan Liu, Hao Qiu, Lin F. Yang, Mengfan Xu
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we demonstrate the effectiveness of our algorithm through numerical experiments on both synthetic and real-world datasets.3 The objective is twofold. First, we show that the cumulative regret of our algorithm grows logarithmically with respect to T and is significantly smaller than that of existing benchmarks, thereby validating our theoretical findings. We use Dr Fed-UCB, proposed by [Xu and Klabjan, 2023], as the baseline. Second, we conduct a simulation study to examine how the regret depends on the link probability p and the algebraic connectivity of the base graph G, as reflected in the regret bound. |
| Researcher Affiliation | Academia | Jingyuan Liu Nanjing University EMAIL Hao Qiu Università degli Studi di Milano EMAIL Lin Yang Nanjing University EMAIL Mengfan Xu University of Massachusetts Amherst EMAIL |
| Pseudocode | Yes | Algorithm 1 Gossip Successive Elimination for Agent i [N] |
| Open Source Code | Yes | 3The code for the experiments is available at https://github.com/haoqiu95/multi-agent-bandit. |
| Open Datasets | Yes | For real-world experiments, we use the Movie Lens dataset and refer to Yi and Vojnovic [2023] for details. |
| Dataset Splits | No | For synthetic experiment setting, We set T = 10000, N = 16, and K = 5; for the Petersen graph, we use N = 10 by definition. For the comparison with Dr Fed-UCB, we consider a complete graph and a high link probability (p = 0.9), as required therein. Before the game starts, we sample each qi independently and uniformly from the interval [0, 1] for each agent i. The local mean reward of arm k on agent i is given by µi,k = qi k 1 K 1, and the global mean reward of arm k is µk = k 1 N . At each time step t, each agent i [N] selects an arm and observes the local reward. For real-world experiments, we use the Movie Lens dataset and refer to Yi and Vojnovic [2023] for details. We set the horizon T = 10,000, and select 20 users as agents (N = 20) and 5 genres as arms (K = 5). At each time step t, each agent randomly selects a movie from the genres. All ratings (rewards) of movies are normalised to [0, 1]. |
| Hardware Specification | No | Our experiments can be run just on laptop, and thus our paper do not include any further information on computer resources. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers used for the experiments. |
| Experiment Setup | Yes | Experimental Settings. For synthetic experiment setting, We set T = 10000, N = 16, and K = 5; for the Petersen graph, we use N = 10 by definition. For the comparison with Dr Fed-UCB, we consider a complete graph and a high link probability (p = 0.9), as required therein. Before the game starts, we sample each qi independently and uniformly from the interval [0, 1] for each agent i. The local mean reward of arm k on agent i is given by µi,k = qi k 1 K 1, and the global mean reward of arm k is µk = k 1 N . At each time step t, each agent i [N] selects an arm and observes the local reward. For real-world experiments, we use the Movie Lens dataset and refer to Yi and Vojnovic [2023] for details. We set the horizon T = 10,000, and select 20 users as agents (N = 20) and 5 genres as arms (K = 5). At each time step t, each agent randomly selects a movie from the genres. All ratings (rewards) of movies are normalised to [0, 1]. |