Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Learning in Stackelberg Mean Field Games: A Non-Asymptotic Analysis
Authors: Sihan Zeng, Benjamin Patrick Evans, Sujay Bhatt, Leo Ardon, Sumitra Ganesh, Alec Koppel
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Simulation results in a range of well-established economics environments demonstrate that AC-SMFG outperforms existing multi-agent and MFG learning baselines in policy quality and convergence speed. ... We conduct a comprehensive evaluation of the proposed methodology across a diverse set of canonical MFGs. ... The convergence of the leader and follower rewards is compared in Figure 2. |
| Researcher Affiliation | Industry | Sihan Zeng1, Benjamin Patrick Evans2, Sujay Bhatt1, Leo Ardon2, Sumitra Ganesh1, Alec Koppel1 1J.P.Morgan AI Research, United States 2J.P.Morgan AI Research, United Kingdom EMAIL |
| Pseudocode | Yes | Algorithm 1 Single loop Actor-Critic Algorithm for Stackelberg Mean Field Games (AC-SMFG) Algorithm 2 Actor-Critic Algorithm for Hierarchical Mean Field Games (Simplified for Analysis) |
| Open Source Code | Yes | All source code is available in the supplementary material. |
| Open Datasets | Yes | Specifically, we extend three environments from MFGLib [Guo et al., 2023a], each exhibiting varying degrees of complexity. |
| Dataset Splits | No | The paper uses simulation environments and mentions |
| Hardware Specification | Yes | All approaches are run on a CPU, with Python3, and on an Amazon EC2 with R6i.large. |
| Software Dependencies | No | All approaches are run on a CPU, with Python3, and on an Amazon EC2 with R6i.large. ... For the PPO implementation, we base the implementation off Clean RL (in Torch)... ADAM is used as the optimiser (as implemented in torch). |
| Experiment Setup | Yes | Proposed. The proposed is run with ζ0 = 0.5, α0 = 0.25, β0 = 0.02, ξ0 = 0.25. PPO. For the PPO implementation, we base the implementation off Clean RL (in Torch), using a batch size of 256, hidden layer shape of (64, 64), learning rate of 3e 4, Tan H activation functions for the hidden layers, and a clipping epsilon of 0.2. ADAM is used as the optimiser (as implemented in torch). |