Self-Organized Group for Cooperative Multi-agent Reinforcement Learning
Authors: Jianzhun Shao, Zhiqiang Lou, Hongchang Zhang, Yuhang Jiang, Shuncheng He, Xiangyang Ji
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Sufficient experiments on mainstream multi-agent benchmarks exhibit superiority of SOG. We conduct experiments on three commonly used multi-agent benchmarks, including a resource collection task, a predator-prey task, and a set of customized Star Craft micromanagement tasks. |
| Researcher Affiliation | Academia | Jianzhun Shao, Zhiqiang Lou, Hongchang Zhang, Yuhang Jiang, Shuncheng He, Xiangyang Ji Department of Automation Tsinghua University, Beijing, China {sjz18, lzq20, hc-zhang19, jiangyh19, hesc16}@mails.tsinghua.edu.cn xyji@tsinghua.edu.cn |
| Pseudocode | Yes | We summarize the training procedure in Algorithm 1. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing open-source code or a link to a code repository for the described methodology. |
| Open Datasets | Yes | We test our idea on three commonly used multi-agent benchmarks: Resource Collection, Predator-Prey, and Star Craft II micromanagement tasks. This scenario is modified from the environment described in Liu et al. [18]. built on the multi-agent particle environment [21]. We apply our method to the Star Craft multi-agent challenge (SMAC) [30]. The map designed by Iqbal et al. [12] and Liu et al. [18]. |
| Dataset Splits | No | The paper describes training and testing scenarios with varying numbers of agents or environment conditions (e.g., "The number of the agents for training is uniformly sampled from {2,3,4,5}, while for testing it is sampled from {6,7,8}"). However, it does not provide traditional dataset splits (e.g., percentages or specific counts for train/validation/test sets) for a static dataset, as the environments are simulations where data is generated dynamically. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies or libraries used in the experiments. |
| Experiment Setup | Yes | We train on two kinds of SR: 0.5 and 1.0. Each experiment is repeated 3 or 5 times with different seeds. The number of the agents for training is uniformly sampled from {2,3,4,5}, while for testing it is sampled from {6,7,8}. The overall loss can be written as: Lall = LRL + λ1LF P + λ2LCEB, where λ1 and λ2 are hyper-parameters. Table 2: Ablation studies. (D) means the default setting. Group Num 1 2(D) 4, Msg dim 1 3(D) 10, T 2 4(D) 10. |