reproducibilityindex.ai

Settling the Variance of Multi-Agent Policy Gradients

Authors: Jakub Grudzien Kuba, Muning Wen, Linghui Meng, shangding gu, Haifeng Zhang, David Mguni, Jun Wang, Yaodong Yang

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	On benchmarks of Multi-Agent Mu Jo Co and Star Craft challenges, our OB technique effectively stabilises training and improves the performance of multi-agent PPO and COMA algorithms by a signiﬁcant margin. Code is released at https://github.com/morning9393/ Optimal-Baseline-for-Multi-agent-Policy-Gradients.
Researcher Affiliation	Collaboration	1Imperial College London, 2Huawei R&D UK, 3Shanghai Jiao Tong University, 4Institute of Automation, Chinese Academy of Science, 5University College London, 6Institute for AI, Peking University.
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code is released at https://github.com/morning9393/ Optimal-Baseline-for-Multi-agent-Policy-Gradients.
Open Datasets	Yes	Star Craft Multi-Agent Challenge (SMAC) [25]. In SMAC, each individual unit is controlled by a learning agent, which has ﬁnitely many possible actions to take. The units cooperate to defeat enemy bots across scenarios of different levels of difﬁculty. (...) Multi-Agent Mu Jo Co [5].
Dataset Splits	No	The paper does not explicitly provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) for training, validation, and testing.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper mentions 'PyTorch' in Appendix D but does not specify version numbers for PyTorch or any other software dependencies needed to replicate the experiment.
Experiment Setup	Yes	For each of the baseline on each task, we report the results of ﬁve random seeds. We refer to Appendix F for the detailed hyper-parameter settings for baselines.