Self-Organized Group for Cooperative Multi-agent Reinforcement Learning

Authors: Jianzhun Shao, Zhiqiang Lou, Hongchang Zhang, Yuhang Jiang, Shuncheng He, Xiangyang Ji

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Sufficient experiments on mainstream multi-agent benchmarks exhibit superiority of SOG. We conduct experiments on three commonly used multi-agent benchmarks, including a resource collection task, a predator-prey task, and a set of customized Star Craft micromanagement tasks.
Researcher Affiliation Academia Jianzhun Shao, Zhiqiang Lou, Hongchang Zhang, Yuhang Jiang, Shuncheng He, Xiangyang Ji Department of Automation Tsinghua University, Beijing, China {sjz18, lzq20, hc-zhang19, jiangyh19, hesc16}@mails.tsinghua.edu.cn xyji@tsinghua.edu.cn
Pseudocode Yes We summarize the training procedure in Algorithm 1.
Open Source Code No The paper does not contain any explicit statement about releasing open-source code or a link to a code repository for the described methodology.
Open Datasets Yes We test our idea on three commonly used multi-agent benchmarks: Resource Collection, Predator-Prey, and Star Craft II micromanagement tasks. This scenario is modified from the environment described in Liu et al. [18]. built on the multi-agent particle environment [21]. We apply our method to the Star Craft multi-agent challenge (SMAC) [30]. The map designed by Iqbal et al. [12] and Liu et al. [18].
Dataset Splits No The paper describes training and testing scenarios with varying numbers of agents or environment conditions (e.g., "The number of the agents for training is uniformly sampled from {2,3,4,5}, while for testing it is sampled from {6,7,8}"). However, it does not provide traditional dataset splits (e.g., percentages or specific counts for train/validation/test sets) for a static dataset, as the environments are simulations where data is generated dynamically.
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies No The paper does not provide specific version numbers for any software dependencies or libraries used in the experiments.
Experiment Setup Yes We train on two kinds of SR: 0.5 and 1.0. Each experiment is repeated 3 or 5 times with different seeds. The number of the agents for training is uniformly sampled from {2,3,4,5}, while for testing it is sampled from {6,7,8}. The overall loss can be written as: Lall = LRL + λ1LF P + λ2LCEB, where λ1 and λ2 are hyper-parameters. Table 2: Ablation studies. (D) means the default setting. Group Num 1 2(D) 4, Msg dim 1 3(D) 10, T 2 4(D) 10.