Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Situation-Dependent Causal Influence-Based Cooperative Multi-Agent Reinforcement Learning
Authors: Xiao Du, Yutong Ye, Pengyu Zhang, Yaning Yang, Mingsong Chen, Ting Wang
AAAI 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on various MARL benchmarks demonstrate the superiority of our method compared to state-of-the-art approaches. |
| Researcher Affiliation | Academia | Xiao Du, Yutong Ye, Pengyu Zhang, Yaning Yang, Mingsong Chen, Ting Wang* Software Engineering Institute, East China Normal University EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1: Training algorithm |
| Open Source Code | No | The paper does not contain an explicit statement or link indicating the availability of open-source code for the described methodology. |
| Open Datasets | Yes | We evaluate our proposed approach on three benchmark multi-agent tasks: Partial Observation Cooperative Predator Prey, Cooperative Navigation, and Cooperative Line Control. The benchmarks environment is implemented in a Multi-Agent Particle Environment ((Lowe et al. 2017)), |
| Dataset Splits | No | The paper mentions training and evaluating on benchmark multi-agent tasks but does not specify exact train/validation/test dataset splits or percentages. |
| Hardware Specification | Yes | All algorithms are trained in a Linux server with a 2.30 GHz Xeon(R) CPU and two Nvidia 4090 graphics cards. |
| Software Dependencies | No | The paper mentions various algorithms and environments (e.g., MADDPG, Multi-Agent Particle Environment) but does not provide specific version numbers for software dependencies or libraries like Python, PyTorch, etc. |
| Experiment Setup | Yes | The learning rates of the critic network and the actor network are set to 0.001. The discount factor γ is set to 0.95. Each episode lasts up to 25 timesteps. To estimate the transition marginal distribution p(sj t+1 si t), the number K of per Monte-Carlo sample is set to 64. |