reproducibilityindex.ai

Trust Region Policy Optimisation in Multi-Agent Reinforcement Learning

Authors: Jakub Grudzien Kuba, Ruiqing Chen, Muning Wen, Ying Wen, Fanglei Sun, Jun Wang, Yaodong Yang

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the proposed methods on a series of Multi-Agent Mu Jo Co and Star Craft II tasks. Results show that HATRPO and HAPPO signiﬁcantly outperform strong baselines such as IPPO, MAPPO and MADDPG on all tested tasks, thereby establishing a new state of the art.
Researcher Affiliation	Collaboration	1University of Oxford, 2Huawei R&D UK, 3Shanghai Tech University, 4Shanghai Jiao Tong University 5University College London 6Institute for AI, Peking University & BIGAI
Pseudocode	Yes	The detailed pseudocode of HATRPO is listed in Appendix D.3. ... We refer to the above procedure as HAPPO and Appendix D.4 for its full pseudocode.
Open Source Code	Yes	Code is available at https://github.com/PKU-MARL/ TRPO-PPO-in-MARL.
Open Datasets	Yes	We consider two most common benchmarks Star Craft II Multi-Agent Challenge (SMAC) (Samvelyan et al., 2019) and Multi-Agent Mu Jo Co (de Witt et al., 2020b) for evaluating MARL algorithms.
Dataset Splits	No	The paper uses standard benchmark environments but does not specify explicit training/validation/test dataset splits (e.g., percentages or counts).
Hardware Specification	No	The paper does not specify any hardware details (e.g., CPU, GPU models, or memory) used for running the experiments.
Software Dependencies	No	The implementation of MADDPG is adopted from the Tianshou framework (Weng et al., 2021). The paper mentions software components like 'optimizer Adam' and 'actor network mlp' but does not provide specific version numbers for any libraries or frameworks (e.g., PyTorch, TensorFlow, Python version, Tianshou version).
Experiment Setup	Yes	All hyperparameter settings and implementations details can be found in Appendix E. Tables 1-7 in Appendix E provide detailed hyperparameter values for SMAC and Multi-Agent Mu Jo Co domains, including learning rates, batch sizes, optimizer settings, and other configuration parameters.