Multi-Agent Reinforcement Learning is a Sequence Modeling Problem
Authors: Muning Wen, Jakub Kuba, Runji Lin, Weinan Zhang, Ying Wen, Jun Wang, Yaodong Yang
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To validate MAT, we conduct extensive experiments on Star Craft II, Multi-Agent Mu Jo Co, Dexterous Hands Manipulation, and Google Research Football benchmarks. Results demonstrate that MAT achieves superior performance and data efficiency compared to strong baselines including MAPPO and HAPPO. |
| Researcher Affiliation | Collaboration | Muning Wen1,2, Jakub Grudzien Kuba3, Runji Lin4, Weinan Zhang1, Ying Wen1, Jun Wang2,5, Yaodong Yang6,7, 1Shanghai Jiao Tong University, 2Digital Brain Lab, 3University of Oxford, 4Institute of Automation, Chinese Academy of Science, 5University College London, 6Beijing Institute for General AI, 7Institute for AI, Peking University |
| Pseudocode | Yes | We list the full pseudocode of MAT in Appendix A and a video that shows the dynamic data flow of MAT in https://sites.google.com/view/multi-agent-transformer. |
| Open Source Code | Yes | The source code could be accessed directly with this link https://github.com/PKU-MARL/Multi-Agent-Transformer. |
| Open Datasets | Yes | To validate MAT, we conduct extensive experiments on Star Craft II, Multi-Agent Mu Jo Co, Dexterous Hands Manipulation, and Google Research Football benchmarks. |
| Dataset Splits | Yes | We investigate the generalization capability of pre-trained models on each downstream task with 0% (zero-shot), 1%, 5%, 10% few-short new examples, respectively. |
| Hardware Specification | No | The paper mentions running "extensive experiments" on various benchmarks but does not specify any hardware details such as GPU models, CPU types, or memory used for these experiments. |
| Software Dependencies | No | The paper mentions using Python for implementation (implied by typical ML papers) but does not specify any software versions (e.g., Python version, library versions like PyTorch, TensorFlow, CUDA). |
| Experiment Setup | Yes | We apply the same hyper-parameters of baseline algorithms from their original paper to ensure their best performance, and adopt the same hyper-parameter tuning process for our methods with details in Appendix B. |