Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Mean-Field Sampling for Cooperative Multi-Agent Reinforcement Learning

Authors: Emile Anand, Ishani Karmarkar, Guannan Qu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	While our results are theoretical in nature, we hope SUBSAMPLE-MFQ will further exploration of sampling in Markov games, and potentially inspire new practical multi-agent algorithms. Appendix B presents numerical simulations on the performance of SUBSAMPLE-MFQ on the Gaussian squeeze and constrained exploration tasks.
Researcher Affiliation	Academia	Emile Anand Georgia Institute of Technology Atlanta, GA 30308 EMAIL Ishani Karmarkar Stanford University Palo Alto, CA, 94305 EMAIL Guannan Qu Carnegie Mellon University Pittsburgh, PA 94035 EMAIL
Pseudocode	Yes	Algorithm 1 SUB-SAMPLE-MFQ: Learning Require: A multi-agent system as in Section 2, number of iterations T, sampling parameters k [n], m N, and discount factor γ (0, 1). Algorithm 2 SUBSAMPLE-MFQ: Execution Require: Parameter T for the number of iterations for the decision-making sequence. Sampling parameter k [n], m N. Discount factor γ. Policy ˆπest k,m(sg, Fs ).
Open Source Code	Yes	We provide supporting code for the algorithm and experiments in https://github.com/emiletimothy/ Mean-Field-Subsample-Q-Learning
Open Datasets	No	Motivating examples. Below we give examples of two cooperative MARL settings which are naturally modeled by our setting. Our experiments in Appendix B reveal a monotonic improvement in the learned policies as k n, while providing a substantial speedup over mean-field Q-learning2. Gaussian squeeze: In this task, n homogeneous agents determine individual actions ai to jointly maximize the objective r(x) = xe (x µ)2/σ2, where x = Pn i=1 ai, and µ and σ are the predefined mean and variance of the system. Constrained exploration: Consider an M M grid. Each agent s state is a coordinate in [M] [M]. The state represents the center of a d d box where the global agent constrains the local agents movements.
Dataset Splits	No	For this task, we ran a simulation with n = 8 agents, with m = 20 samples in the empirically adapted Bellman operator. We provide simulation results in Figure 2a. For this task, we ran a small-scale simulation with n = 8 agents, and a large-scale simulation with n = 50 agents, and used m = 20 samples in the empirical Bellman operator.
Hardware Specification	Yes	All experiments were run on a 2-core CPU server with 12GB RAM.
Software Dependencies	No	The paper does not explicitly mention specific software dependencies with version numbers.
Experiment Setup	Yes	For this task, we ran a simulation with n = 8 agents, with m = 20 samples in the empirically adapted Bellman operator. We provide simulation results in Figure 2a. For this task, we ran a small-scale simulation with n = 8 agents, and a large-scale simulation with n = 50 agents, and used m = 20 samples in the empirical Bellman operator.