Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Mean-Field Sampling for Cooperative Multi-Agent Reinforcement Learning
Authors: Emile Anand, Ishani Karmarkar, Guannan Qu
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | While our results are theoretical in nature, we hope SUBSAMPLE-MFQ will further exploration of sampling in Markov games, and potentially inspire new practical multi-agent algorithms. Appendix B presents numerical simulations on the performance of SUBSAMPLE-MFQ on the Gaussian squeeze and constrained exploration tasks. |
| Researcher Affiliation | Academia | Emile Anand Georgia Institute of Technology Atlanta, GA 30308 EMAIL Ishani Karmarkar Stanford University Palo Alto, CA, 94305 EMAIL Guannan Qu Carnegie Mellon University Pittsburgh, PA 94035 EMAIL |
| Pseudocode | Yes | Algorithm 1 SUB-SAMPLE-MFQ: Learning Require: A multi-agent system as in Section 2, number of iterations T, sampling parameters k [n], m N, and discount factor γ (0, 1). Algorithm 2 SUBSAMPLE-MFQ: Execution Require: Parameter T for the number of iterations for the decision-making sequence. Sampling parameter k [n], m N. Discount factor γ. Policy ˆπest k,m(sg, Fs ). |
| Open Source Code | Yes | We provide supporting code for the algorithm and experiments in https://github.com/emiletimothy/ Mean-Field-Subsample-Q-Learning |
| Open Datasets | No | Motivating examples. Below we give examples of two cooperative MARL settings which are naturally modeled by our setting. Our experiments in Appendix B reveal a monotonic improvement in the learned policies as k n, while providing a substantial speedup over mean-field Q-learning2. Gaussian squeeze: In this task, n homogeneous agents determine individual actions ai to jointly maximize the objective r(x) = xe (x µ)2/σ2, where x = Pn i=1 ai, and µ and σ are the predefined mean and variance of the system. Constrained exploration: Consider an M M grid. Each agent s state is a coordinate in [M] [M]. The state represents the center of a d d box where the global agent constrains the local agents movements. |
| Dataset Splits | No | For this task, we ran a simulation with n = 8 agents, with m = 20 samples in the empirically adapted Bellman operator. We provide simulation results in Figure 2a. For this task, we ran a small-scale simulation with n = 8 agents, and a large-scale simulation with n = 50 agents, and used m = 20 samples in the empirical Bellman operator. |
| Hardware Specification | Yes | All experiments were run on a 2-core CPU server with 12GB RAM. |
| Software Dependencies | No | The paper does not explicitly mention specific software dependencies with version numbers. |
| Experiment Setup | Yes | For this task, we ran a simulation with n = 8 agents, with m = 20 samples in the empirically adapted Bellman operator. We provide simulation results in Figure 2a. For this task, we ran a small-scale simulation with n = 8 agents, and a large-scale simulation with n = 50 agents, and used m = 20 samples in the empirical Bellman operator. |