Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Learning in Stackelberg Mean Field Games: A Non-Asymptotic Analysis

Authors: Sihan Zeng, Benjamin Patrick Evans, Sujay Bhatt, Leo Ardon, Sumitra Ganesh, Alec Koppel

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Simulation results in a range of well-established economics environments demonstrate that AC-SMFG outperforms existing multi-agent and MFG learning baselines in policy quality and convergence speed. ... We conduct a comprehensive evaluation of the proposed methodology across a diverse set of canonical MFGs. ... The convergence of the leader and follower rewards is compared in Figure 2.
Researcher Affiliation	Industry	Sihan Zeng1, Benjamin Patrick Evans2, Sujay Bhatt1, Leo Ardon2, Sumitra Ganesh1, Alec Koppel1 1J.P.Morgan AI Research, United States 2J.P.Morgan AI Research, United Kingdom EMAIL
Pseudocode	Yes	Algorithm 1 Single loop Actor-Critic Algorithm for Stackelberg Mean Field Games (AC-SMFG) Algorithm 2 Actor-Critic Algorithm for Hierarchical Mean Field Games (Simplified for Analysis)
Open Source Code	Yes	All source code is available in the supplementary material.
Open Datasets	Yes	Specifically, we extend three environments from MFGLib [Guo et al., 2023a], each exhibiting varying degrees of complexity.
Dataset Splits	No	The paper uses simulation environments and mentions
Hardware Specification	Yes	All approaches are run on a CPU, with Python3, and on an Amazon EC2 with R6i.large.
Software Dependencies	No	All approaches are run on a CPU, with Python3, and on an Amazon EC2 with R6i.large. ... For the PPO implementation, we base the implementation off Clean RL (in Torch)... ADAM is used as the optimiser (as implemented in torch).
Experiment Setup	Yes	Proposed. The proposed is run with ζ0 = 0.5, α0 = 0.25, β0 = 0.02, ξ0 = 0.25. PPO. For the PPO implementation, we base the implementation off Clean RL (in Torch), using a batch size of 256, hidden layer shape of (64, 64), learning rate of 3e 4, Tan H activation functions for the hidden layers, and a clipping epsilon of 0.2. ADAM is used as the optimiser (as implemented in torch).