Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

A Fully Decentralized Surrogate for Multi-Agent Policy Optimization

Authors: Kefan Su, Zongqing Lu

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we evaluate DPO, IPPO, and independent Q-learning (IQL) in a variety of cooperative multi-agent tasks, covering discrete and continuous action spaces, as well as fully and partially observable environments. The results show DPO outperforms both IPPO and IQL in most tasks, which serves as evidence for our theoretical results.
Researcher Affiliation	Academia	Kefan Su EMAIL School of Computer Science Peking University Zongqing Lu EMAIL School of Computer Science Peking University
Pseudocode	Yes	Algorithm 1 The practical algorithm of DPO
Open Source Code	Yes	The code is available at https://github.com/PKU-RL/DPO.
Open Datasets	Yes	In this section, we compare the practical algorithm of DPO with IPPO (de Witt et al., 2020) and IQL (Tan, 1993) in a variety of cooperative multi-agent environments, including a cooperative stochastic game, MPE (Lowe et al., 2017), multi-agent Mu Jo Co (Peng et al., 2021), and SMAC (Samvelyan et al., 2019), covering both discrete and continuous action spaces, as well as fully and partially observable environments.
Dataset Splits	No	The paper does not explicitly provide training/test/validation dataset splits. It describes using various multi-agent environments (MPE, MuJoCo, SMAC) and mentions that "all the learning curves are from 5 random seeds", which relates to experimental runs rather than static dataset partitioning.
Hardware Specification	Yes	We performed the whole experiment with a total of four NVIDIA A100 GPUs.
Software Dependencies	No	The paper mentions "The version of the game Star Craft2 in SMAC is 4.10 for our experiments". While this specifies the version of the environment, it does not provide specific version numbers for general software dependencies like Python, PyTorch, TensorFlow, or other libraries used for implementation.
Experiment Setup	Yes	Table 2: Hyperparameters for all the experiments hyperparameter value MLP layers 3 hidden size 128 non-linear Re LU optimizer Adam actor_lr 5e-4 critic_lr 5e-4 numbers of epochs 15 initial βi 1 0.01 initial βi 2 0.01 δ 1.5 ω 2 dtarget different for environments as aforementioned clip parameter for IPPO 0.2