Multiagent Q-learning with Sub-Team Coordination
Authors: Wenhan Huang, Kai Li, Kun Shao, Tianze Zhou, Matthew Taylor , Jun Luo, Dongge Wang, Hangyu Mao, Jianye Hao, Jun Wang, Xiaotie Deng
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that QSCAN s performance dominates stateof-the-art methods in matrix games, predator-prey tasks, the Switch challenge in MA-Gym. Additionally, QSCAN achieves comparable performances to those methods in a selection of Star Craft II micro-management tasks. |
| Researcher Affiliation | Collaboration | 1Shanghai Jiao Tong University, 2Huawei Noah s Ark Lab, 3Beijing Institute of Technology 4University of Alberta, 5Alberta Machine Intelligence Institute (Amii), 6EPFL 7Tianjin University, 8University College London, 9Peking University |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | We provide the detailed structure of our approaches in Sec. 4 and the benchmarks are open-sourced. We will publish our code after acceptance. |
| Open Datasets | Yes | We compare our approaches QPAIR, QSCAN with QMIX and QPLEX in various coordination tasks, including matrix games, predator-prey challenges [8], the Switch task [16], and the Star Craft Multi-Agent Challenge (SMAC) [17]. ... [8] Wendelin Böhmer, Vitaly Kurin, and Shimon Whiteson. Deep coordination graphs. ... [16] Anurag Koul. ma-gym: Collection of multi-agent environments based on openai gym. https: //github.com/koulanurag/ma-gym, 2019. ... [17] Mikayel Samvelyan, Tabish Rashid, Christian Schroeder de Witt, Gregory Farquhar, Nantas Nardelli, Tim G. J. Rudner, Chia-Man Hung, Philip H. S. Torr, Jakob Foerster, and Shimon Whiteson. The Star Craft Multi-Agent Challenge. |
| Dataset Splits | No | The paper describes training details and environments, but does not specify explicit training/validation/test dataset splits with percentages or counts, which is typical for supervised learning with fixed datasets, rather than interactive reinforcement learning environments. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments. The authors' checklist explicitly states: "Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [No]" |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers, such as Python versions or library versions. |
| Experiment Setup | Yes | For a fair comparison, we use the same neural network architectures as QMIX and QPLEX, with the same hyper-parameters and training configurations used in their original papers. Specifically, we set the learning rate to 5e-4 for all the scenarios. We train for 2M timesteps for matrix game, 10M timesteps for predator-prey, 2M timesteps for Switch and 10M timesteps for SMAC. For all cases, the Adam optimizer is used with ε=10−5. The discount factor is 0.99. The target networks are updated every 200 episodes. We use an epsilon-greedy exploration strategy, with epsilon decaying linearly from 1 to 0.05 over 1M timesteps. |