Decompose a Task into Generalizable Subtasks in Multi-Agent Reinforcement Learning

Authors: Zikang Tian, Ruizhi Chen, Xing Hu, Ling Li, Rui Zhang, Fan Wu, Shaohui Peng, Jiaming Guo, Zidong Du, Qi Guo, Yunji Chen

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results demonstrate that DT2GS possesses sound zero-shot generalization capability across tasks, exhibits sufficient transferability, and outperforms existing methods in both multi-task and single-task problems.
Researcher Affiliation Collaboration 1SKL of Processors, Institute of Computing Technology, CAS, Beijing, China 2University of Chinese Academy of Sciences, Beijing, China 3Cambricon Technologies, Beijing, China 4Intelligent Software Research Center, Institute of Software, CAS, Beijing, China 5Shanghai Innovation Center for Processor Technologies, SHIC, Shanghai, China
Pseudocode Yes A pseudocode is provided in Appendix H to illustrate the DT2GS framework.
Open Source Code No The paper does not include any explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets Yes We evaluated the performance of DT2GS on the Star Craft Multi-Agent Challenge (SMAC) [22] and the multi-agent particle world environments (MPE) [16] (shown in Appendix D).
Dataset Splits No The paper evaluates zero-shot generalization by training on source tasks and deploying to target tasks without fine-tuning, but it does not specify explicit train/validation/test dataset splits (e.g., percentages or counts) for reproduction within a single task context.
Hardware Specification No The paper does not explicitly describe the specific hardware used for running its experiments (e.g., GPU models, CPU types, or cloud instance specifications).
Software Dependencies No The paper mentions algorithms (e.g., MAPPO, LSTM, GRU) and optimizers (Adam) but does not provide specific software dependencies with version numbers (e.g., Python version, PyTorch/TensorFlow versions, or other library versions).
Experiment Setup Yes We set the number of subtasks nk to 4 and averaged all results over 4 random seeds. and Table 2: List of Hyperparameters in Appendix G lists hidden layer dimension of DT2GS s Encoder 8, MLP s hidden layer dimension 64, attention s hidden layer dimension 64, attention s heads 3, number of subtasks 4, optimizer Adam, learning rate of actor and critic 0.0005.