Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

A Principle of Targeted Intervention for Multi-Agent Reinforcement Learning

Authors: Anjie Liu, Jianhong Wang, Samuel Kaski, Jun Wang, Mengyue Yang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In experiments, we demonstrate the effectiveness of our proposed targeted intervention, and verify the result of relevance graph analysis. The proposed PSI is evaluated in Multi-Agent Particle Environment (MPE) and Hanabi [19, 20].
Researcher Affiliation Academia Anjie Liu Thrust of Artificial Intelligence HKUST (GZ) Jianhong Wang INFORMED-AI Hub University of Bristol Samuel Kaski ELLIS Institute Finland Jun Wang Centre for Artificial Intelligence University College London Mengyue Yang School of Engineering Mathematics and Technology University of Bristol
Pseudocode Yes Algorithm 1 Graph Embedding using GNNs Algorithm 2 5 Save Convention Algorithm 3 The Chop Convention
Open Source Code Yes Our code is publicly available as an open-source repository.2 2https://github.com/iamlil AJ/Pre-Strategy-Intervention
Open Datasets Yes The proposed PSI is evaluated in Multi-Agent Particle Environment (MPE) and Hanabi [19, 20]. [19] Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, Pieter Abbeel, and Igor Mordatch. Multi-agent actor-critic for mixed cooperative-competitive environments. Neural Information Processing Systems (NIPS), 2017. [20] Nolan Bard, Jakob N. Foerster, Sarath Chandar, Neil Burch, Marc Lanctot, H. Francis Song, Emilio Parisotto, Vincent Dumoulin, Subhodeep Moitra, Edward Hughes, Iain Dunning, Shibl Mourad, Hugo Larochelle, Marc G. Bellemare, and Michael Bowling. The hanabi challenge: A new frontier for ai research. Artificial Intelligence, 280:103216, March 2020.
Dataset Splits Yes Results from 5 random seeds are reported as means with 95% confidence intervals. We test the following three noise scenarios to assess robustness under different train-test conditions: (1) Noise during Training and Testing. (2) Noise during Training Only. (3) Noise during Testing Only.
Hardware Specification Yes Our experiments were run on NVIDIA RTX 4090 and A100 GPUs.
Software Dependencies No The implementation is mainly based on Jax MARL [86].
Experiment Setup Yes The following paragraphs detail the architecture of our proposed method and the baselines used for comparison. Subsequently, we describe how the additional desired outcome is defined and implemented within the MPE and Hanabi environments respectively. K Hyperparameters Table 2: Hyperparameters for MPE in IQL across two scenarios. Table 3: Hyperparameters for MPE in VDN across two scenarios. Table 4: Hyperparameters for MPE in QMIX. Table 5: Hyperparameters for Independent PPO in MPE. Table 6: Hyperparameters for PQN in Hanabi. Table 7: Hyperparameters for IPPO in Hanabi. Table 8: Hyperparameters for MAPPO in Hanabi.