Multiagent Q-learning with Sub-Team Coordination

Authors: Wenhan Huang, Kai Li, Kun Shao, Tianze Zhou, Matthew Taylor , Jun Luo, Dongge Wang, Hangyu Mao, Jianye Hao, Jun Wang, Xiaotie Deng

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that QSCAN s performance dominates stateof-the-art methods in matrix games, predator-prey tasks, the Switch challenge in MA-Gym. Additionally, QSCAN achieves comparable performances to those methods in a selection of Star Craft II micro-management tasks.
Researcher Affiliation Collaboration 1Shanghai Jiao Tong University, 2Huawei Noah s Ark Lab, 3Beijing Institute of Technology 4University of Alberta, 5Alberta Machine Intelligence Institute (Amii), 6EPFL 7Tianjin University, 8University College London, 9Peking University
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No We provide the detailed structure of our approaches in Sec. 4 and the benchmarks are open-sourced. We will publish our code after acceptance.
Open Datasets Yes We compare our approaches QPAIR, QSCAN with QMIX and QPLEX in various coordination tasks, including matrix games, predator-prey challenges [8], the Switch task [16], and the Star Craft Multi-Agent Challenge (SMAC) [17]. ... [8] Wendelin Böhmer, Vitaly Kurin, and Shimon Whiteson. Deep coordination graphs. ... [16] Anurag Koul. ma-gym: Collection of multi-agent environments based on openai gym. https: //github.com/koulanurag/ma-gym, 2019. ... [17] Mikayel Samvelyan, Tabish Rashid, Christian Schroeder de Witt, Gregory Farquhar, Nantas Nardelli, Tim G. J. Rudner, Chia-Man Hung, Philip H. S. Torr, Jakob Foerster, and Shimon Whiteson. The Star Craft Multi-Agent Challenge.
Dataset Splits No The paper describes training details and environments, but does not specify explicit training/validation/test dataset splits with percentages or counts, which is typical for supervised learning with fixed datasets, rather than interactive reinforcement learning environments.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments. The authors' checklist explicitly states: "Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [No]"
Software Dependencies No The paper does not provide specific software dependencies with version numbers, such as Python versions or library versions.
Experiment Setup Yes For a fair comparison, we use the same neural network architectures as QMIX and QPLEX, with the same hyper-parameters and training configurations used in their original papers. Specifically, we set the learning rate to 5e-4 for all the scenarios. We train for 2M timesteps for matrix game, 10M timesteps for predator-prey, 2M timesteps for Switch and 10M timesteps for SMAC. For all cases, the Adam optimizer is used with ε=10−5. The discount factor is 0.99. The target networks are updated every 200 episodes. We use an epsilon-greedy exploration strategy, with epsilon decaying linearly from 1 to 0.05 over 1M timesteps.