reproducibilityindex.ai

LDSA: Learning Dynamic Subtask Assignment in Cooperative Multi-Agent Reinforcement Learning

Authors: Mingyu Yang, Jian Zhao, Xunhan Hu, Wengang Zhou, Jiangcheng Zhu, Houqiang Li

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results show that LDSA learns reasonable and effective subtask assignment for better collaboration and significantly improves the learning performance on the challenging Star Craft II micromanagement benchmark and Google Research Football.
Researcher Affiliation	Collaboration	1University of Science and Technology of China, 2Huawei Cloud 3Hefei Comprehensive National Science Center, Institute of Artificial Intelligence
Pseudocode	No	The paper describes the proposed method in text and figures (e.g., Figure 1), but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	We include the code, data, and instructions needed to reproduce the main experimental results in the supplemental material.
Open Datasets	Yes	We evaluate LDSA on the SMAC benchmark [27]... Additional experiments on Google Research Football (GRF) [20]...
Dataset Splits	No	The paper evaluates performance based on 'test win rate' over 'timesteps' and 'episodes' in reinforcement learning environments (SMAC and GRF), rather than explicitly defining 'training/test/validation dataset splits' in the manner of static supervised learning datasets.
Hardware Specification	No	The paper mentions support by a 'GPU cluster built by MCC Lab of Information Science and Technology Institution, USTC' in the Acknowledgments, but does not provide specific GPU models, CPU models, or detailed hardware specifications within the provided text.
Software Dependencies	No	The paper mentions implementing baselines using 'Py MARL [27]' but does not provide specific version numbers for PyMARL or any other software dependencies such as Python, PyTorch, or CUDA.
Experiment Setup	Yes	For all experiments, the number of subtasks is set to 4 and the length of subtask representations is set to 64, i.e., k = 4, m = 64. We carry out a grid search for regularizers coefficients λϕ and λh on the SMAC scenario corridor and then set them to 10 3 and 10 3, respectively, for all scenarios.