Decompose a Task into Generalizable Subtasks in Multi-Agent Reinforcement Learning
Authors: Zikang Tian, Ruizhi Chen, Xing Hu, Ling Li, Rui Zhang, Fan Wu, Shaohui Peng, Jiaming Guo, Zidong Du, Qi Guo, Yunji Chen
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results demonstrate that DT2GS possesses sound zero-shot generalization capability across tasks, exhibits sufficient transferability, and outperforms existing methods in both multi-task and single-task problems. |
| Researcher Affiliation | Collaboration | 1SKL of Processors, Institute of Computing Technology, CAS, Beijing, China 2University of Chinese Academy of Sciences, Beijing, China 3Cambricon Technologies, Beijing, China 4Intelligent Software Research Center, Institute of Software, CAS, Beijing, China 5Shanghai Innovation Center for Processor Technologies, SHIC, Shanghai, China |
| Pseudocode | Yes | A pseudocode is provided in Appendix H to illustrate the DT2GS framework. |
| Open Source Code | No | The paper does not include any explicit statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | We evaluated the performance of DT2GS on the Star Craft Multi-Agent Challenge (SMAC) [22] and the multi-agent particle world environments (MPE) [16] (shown in Appendix D). |
| Dataset Splits | No | The paper evaluates zero-shot generalization by training on source tasks and deploying to target tasks without fine-tuning, but it does not specify explicit train/validation/test dataset splits (e.g., percentages or counts) for reproduction within a single task context. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware used for running its experiments (e.g., GPU models, CPU types, or cloud instance specifications). |
| Software Dependencies | No | The paper mentions algorithms (e.g., MAPPO, LSTM, GRU) and optimizers (Adam) but does not provide specific software dependencies with version numbers (e.g., Python version, PyTorch/TensorFlow versions, or other library versions). |
| Experiment Setup | Yes | We set the number of subtasks nk to 4 and averaged all results over 4 random seeds. and Table 2: List of Hyperparameters in Appendix G lists hidden layer dimension of DT2GS s Encoder 8, MLP s hidden layer dimension 64, attention s hidden layer dimension 64, attention s heads 3, number of subtasks 4, optimizer Adam, learning rate of actor and critic 0.0005. |