Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
A Fully Decentralized Surrogate for Multi-Agent Policy Optimization
Authors: Kefan Su, Zongqing Lu
TMLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we evaluate DPO, IPPO, and independent Q-learning (IQL) in a variety of cooperative multi-agent tasks, covering discrete and continuous action spaces, as well as fully and partially observable environments. The results show DPO outperforms both IPPO and IQL in most tasks, which serves as evidence for our theoretical results. |
| Researcher Affiliation | Academia | Kefan Su EMAIL School of Computer Science Peking University Zongqing Lu EMAIL School of Computer Science Peking University |
| Pseudocode | Yes | Algorithm 1 The practical algorithm of DPO |
| Open Source Code | Yes | The code is available at https://github.com/PKU-RL/DPO. |
| Open Datasets | Yes | In this section, we compare the practical algorithm of DPO with IPPO (de Witt et al., 2020) and IQL (Tan, 1993) in a variety of cooperative multi-agent environments, including a cooperative stochastic game, MPE (Lowe et al., 2017), multi-agent Mu Jo Co (Peng et al., 2021), and SMAC (Samvelyan et al., 2019), covering both discrete and continuous action spaces, as well as fully and partially observable environments. |
| Dataset Splits | No | The paper does not explicitly provide training/test/validation dataset splits. It describes using various multi-agent environments (MPE, MuJoCo, SMAC) and mentions that "all the learning curves are from 5 random seeds", which relates to experimental runs rather than static dataset partitioning. |
| Hardware Specification | Yes | We performed the whole experiment with a total of four NVIDIA A100 GPUs. |
| Software Dependencies | No | The paper mentions "The version of the game Star Craft2 in SMAC is 4.10 for our experiments". While this specifies the version of the environment, it does not provide specific version numbers for general software dependencies like Python, PyTorch, TensorFlow, or other libraries used for implementation. |
| Experiment Setup | Yes | Table 2: Hyperparameters for all the experiments hyperparameter value MLP layers 3 hidden size 128 non-linear Re LU optimizer Adam actor_lr 5e-4 critic_lr 5e-4 numbers of epochs 15 initial βi 1 0.01 initial βi 2 0.01 δ 1.5 ω 2 dtarget different for environments as aforementioned clip parameter for IPPO 0.2 |