Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Fully Decentralized Surrogate for Multi-Agent Policy Optimization

Authors: Kefan Su, Zongqing Lu

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we evaluate DPO, IPPO, and independent Q-learning (IQL) in a variety of cooperative multi-agent tasks, covering discrete and continuous action spaces, as well as fully and partially observable environments. The results show DPO outperforms both IPPO and IQL in most tasks, which serves as evidence for our theoretical results.
Researcher Affiliation Academia Kefan Su EMAIL School of Computer Science Peking University Zongqing Lu EMAIL School of Computer Science Peking University
Pseudocode Yes Algorithm 1 The practical algorithm of DPO
Open Source Code Yes The code is available at https://github.com/PKU-RL/DPO.
Open Datasets Yes In this section, we compare the practical algorithm of DPO with IPPO (de Witt et al., 2020) and IQL (Tan, 1993) in a variety of cooperative multi-agent environments, including a cooperative stochastic game, MPE (Lowe et al., 2017), multi-agent Mu Jo Co (Peng et al., 2021), and SMAC (Samvelyan et al., 2019), covering both discrete and continuous action spaces, as well as fully and partially observable environments.
Dataset Splits No The paper does not explicitly provide training/test/validation dataset splits. It describes using various multi-agent environments (MPE, MuJoCo, SMAC) and mentions that "all the learning curves are from 5 random seeds", which relates to experimental runs rather than static dataset partitioning.
Hardware Specification Yes We performed the whole experiment with a total of four NVIDIA A100 GPUs.
Software Dependencies No The paper mentions "The version of the game Star Craft2 in SMAC is 4.10 for our experiments". While this specifies the version of the environment, it does not provide specific version numbers for general software dependencies like Python, PyTorch, TensorFlow, or other libraries used for implementation.
Experiment Setup Yes Table 2: Hyperparameters for all the experiments hyperparameter value MLP layers 3 hidden size 128 non-linear Re LU optimizer Adam actor_lr 5e-4 critic_lr 5e-4 numbers of epochs 15 initial βi 1 0.01 initial βi 2 0.01 δ 1.5 ω 2 dtarget different for environments as aforementioned clip parameter for IPPO 0.2