Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
A Game-Theoretic Framework for Managing Risk in Multi-Agent Systems
Authors: Oliver Slumbers, David Henry Mguni, Stefano B Blumberg, Stephen Marcus Mcaleer, Yaodong Yang, Jun Wang
ICML 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Theoretically and empirically, we show RAE shares many properties with a Nash Equilibrium (NE), establishing convergence properties and generalising to risk-dominant NE in certain cases. ... We empirically demonstrate the minimum reward variance benefits of RAE in matrix games with high-risk outcomes. Results on MARL experiments show RAE generalises to risk-dominant NE in a trust dilemma game and that it reduces instances of crashing by 7x in an autonomous driving setting versus the best performing baseline. |
| Researcher Affiliation | Collaboration | 1University College London, London, UK 2Huawei Technologies, London, UK 3Independent Researcher 4Peking University, Beijing, China. |
| Pseudocode | Yes | Appendix F. Pseudo-code includes 'Algorithm 1 SFP' and 'Algorithm 2 PSRO-RAE'. |
| Open Source Code | No | The paper does not include an explicit statement about releasing the code for the described methodology or provide a direct link to a code repository for their implementation. |
| Open Datasets | Yes | Our stag-hunt environment is taken from (Peysakhovich & Lerer, 2018)... Our driving environment is based on the two-way environment from (Leurent, 2018). |
| Dataset Splits | No | The paper mentions '50 episodes over 5 seeds for intra-distribution testing' for its experiments, but does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts) for reproducibility. |
| Hardware Specification | Yes | All experiments run on one machine with: AMD Ryzen Threadripper 3960X 24 Core 1 x NVIDIA Ge Force RTX 3090 |
| Software Dependencies | Yes | PPO HYPERPARAMS DEFAULT SB3 (RAFFIN ET AL., 2021) from Table 1, and the reference 'Raffin, A., Hill, A., Gleave, A., Kanervisto, A., Ernestus, M., and Dormann, N. Stable-baselines3: Reliable reinforcement learning implementations. Journal of Machine Learning Research, 22(268):1 8, 2021.' |
| Experiment Setup | Yes | Appendix G. Hyperparameter Settings for our experiments provides detailed settings, including 'FP ITERATIONS 100', 'TREMBLE PROBABILITY 0.001', 'LEARNING RATE 0.005', 'RAE GAMMA 0.1, 0.5', among many others in Table 1. |