Settling the Variance of Multi-Agent Policy Gradients
Authors: Jakub Grudzien Kuba, Muning Wen, Linghui Meng, shangding gu, Haifeng Zhang, David Mguni, Jun Wang, Yaodong Yang
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On benchmarks of Multi-Agent Mu Jo Co and Star Craft challenges, our OB technique effectively stabilises training and improves the performance of multi-agent PPO and COMA algorithms by a significant margin. Code is released at https://github.com/morning9393/ Optimal-Baseline-for-Multi-agent-Policy-Gradients. |
| Researcher Affiliation | Collaboration | 1Imperial College London, 2Huawei R&D UK, 3Shanghai Jiao Tong University, 4Institute of Automation, Chinese Academy of Science, 5University College London, 6Institute for AI, Peking University. |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is released at https://github.com/morning9393/ Optimal-Baseline-for-Multi-agent-Policy-Gradients. |
| Open Datasets | Yes | Star Craft Multi-Agent Challenge (SMAC) [25]. In SMAC, each individual unit is controlled by a learning agent, which has finitely many possible actions to take. The units cooperate to defeat enemy bots across scenarios of different levels of difficulty. (...) Multi-Agent Mu Jo Co [5]. |
| Dataset Splits | No | The paper does not explicitly provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) for training, validation, and testing. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions 'PyTorch' in Appendix D but does not specify version numbers for PyTorch or any other software dependencies needed to replicate the experiment. |
| Experiment Setup | Yes | For each of the baseline on each task, we report the results of five random seeds. We refer to Appendix F for the detailed hyper-parameter settings for baselines. |