Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Concentration Network for Reinforcement Learning of Large-Scale Multi-Agent Systems

Authors: Qingxu Fu, Tenghai Qiu, Jianqiang Yi, Zhiqiang Pu, Shiguang Wu9341-9349

AAAI 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that the presented architecture has excellent scalability and ﬂexibility, and signiﬁcantly outperforms existing methods on LMAS benchmarks.
Researcher Affiliation	Academia	1 Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China. 2 School of Artiﬁcial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China.
Pseudocode	No	The paper includes architectural diagrams (Figure 1) but no explicit pseudocode or algorithm blocks.
Open Source Code	Yes	1The source code is available at the following repository. https://github.com/binary-husky/hmp2g/tree/aaai-conc.
Open Datasets	No	The paper introduces a new LMAS benchmark environment called Decentralised Collective Assault (DCA) and describes its characteristics, but does not explicitly state it is a publicly available dataset with concrete access information (link, DOI, formal citation).
Dataset Splits	No	The paper mentions training and testing stages and that 'At each update, we use trajectories collected from 64 episodes', but it does not specify any training/validation/test dataset splits or their percentages.
Hardware Specification	Yes	The experiments are performed with an RTX 8000 GPU, which takes around a day to train 50vs50 or 2 days to train 100vs100 from scratch.
Software Dependencies	No	The paper mentions using 'PPO learner proposed in (Schulman et al. 2017) and improved in (Ye et al. 2020)', but it does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup	Yes	In all experiments, the learning rate is 5e-4, and the discount factor γ is 0.99. At each update, we use trajectories collected from 64 episodes. The GAE parameter λ is 0.95. We select dc = 2 as default, and choose the Dual-Conc Net model shown in Fig. 1(b) as an ablation baseline, referred to as Conc for simplicity.