Concentration Network for Reinforcement Learning of Large-Scale Multi-Agent Systems

Authors: Qingxu Fu, Tenghai Qiu, Jianqiang Yi, Zhiqiang Pu, Shiguang Wu9341-9349

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that the presented architecture has excellent scalability and flexibility, and significantly outperforms existing methods on LMAS benchmarks.
Researcher Affiliation Academia 1 Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China. 2 School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China.
Pseudocode No The paper includes architectural diagrams (Figure 1) but no explicit pseudocode or algorithm blocks.
Open Source Code Yes 1The source code is available at the following repository. https://github.com/binary-husky/hmp2g/tree/aaai-conc.
Open Datasets No The paper introduces a new LMAS benchmark environment called Decentralised Collective Assault (DCA) and describes its characteristics, but does not explicitly state it is a publicly available dataset with concrete access information (link, DOI, formal citation).
Dataset Splits No The paper mentions training and testing stages and that 'At each update, we use trajectories collected from 64 episodes', but it does not specify any training/validation/test dataset splits or their percentages.
Hardware Specification Yes The experiments are performed with an RTX 8000 GPU, which takes around a day to train 50vs50 or 2 days to train 100vs100 from scratch.
Software Dependencies No The paper mentions using 'PPO learner proposed in (Schulman et al. 2017) and improved in (Ye et al. 2020)', but it does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup Yes In all experiments, the learning rate is 5e-4, and the discount factor γ is 0.99. At each update, we use trajectories collected from 64 episodes. The GAE parameter λ is 0.95. We select dc = 2 as default, and choose the Dual-Conc Net model shown in Fig. 1(b) as an ablation baseline, referred to as Conc for simplicity.