reproducibilityindex.ai

Population-Based Diverse Exploration for Sparse-Reward Multi-Agent Tasks

Authors: Pei Xu, Junge Zhang, Kaiqi Huang

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically evaluate the proposed method on three challenging environments: a discrete version of the multiple-particle environment (MPE) [Wang et al., 2019], the Google Research Football [Kurach et al., 2020] and Star Craft II micromanagement (SMAC) [Samvelyan et al., 2019]. In all experiments, we consider the sparse-reward setting. We show that our method signiﬁcantly outperforms the state-of-the-art baselines on almost all tasks (RQ1 in Sec. 4.1).
Researcher Affiliation	Academia	Pei Xu1 , Junge Zhang1, , Kaiqi Huang1,2, 1CRISE, Institute of Automation, Chinese Academy of Sciences 2CAS, Center for Excellence in Brain Science and Intelligence Technology pei.xu@ia.ac.cn, {jgzhang,kqhuang}@nlpr.ia.ac.cn
Pseudocode	No	The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statement about releasing source code for the described methodology or a direct link to a code repository.
Open Datasets	Yes	We evaluate our method on three challenging environments: (1) a discrete version of the multiple-particle environment (MPE) [Wang et al., 2019]; (2) the Star Craft II micromanagement (SMAC) [Samvelyan et al., 2019]; and (3) the Google Research Football (GRF) [Kurach et al., 2020].
Dataset Splits	No	The paper states 'All experiments run with ﬁve random seeds.' and 'Experiment details are given in supplementary material.' but does not explicitly provide training/validation/test dataset splits or cross-validation methodology in the main text.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory specifications) used for running the experiments.
Software Dependencies	No	The paper does not list any specific software dependencies with version numbers (e.g., libraries, frameworks, or operating systems).
Experiment Setup	Yes	All experiments run with ﬁve random seeds. In GRF, all experiments follow the training settings of CDS [Chenghao et al., 2021], except that all experiments use TD(λ) to speed up training. To study the impact of the population size M on exploration, we train agents with different M under the reward-free setting.