Population-Based Diverse Exploration for Sparse-Reward Multi-Agent Tasks

Authors: Pei Xu, Junge Zhang, Kaiqi Huang

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically evaluate the proposed method on three challenging environments: a discrete version of the multiple-particle environment (MPE) [Wang et al., 2019], the Google Research Football [Kurach et al., 2020] and Star Craft II micromanagement (SMAC) [Samvelyan et al., 2019]. In all experiments, we consider the sparse-reward setting. We show that our method significantly outperforms the state-of-the-art baselines on almost all tasks (RQ1 in Sec. 4.1).
Researcher Affiliation Academia Pei Xu1 , Junge Zhang1, , Kaiqi Huang1,2, 1CRISE, Institute of Automation, Chinese Academy of Sciences 2CAS, Center for Excellence in Brain Science and Intelligence Technology pei.xu@ia.ac.cn, {jgzhang,kqhuang}@nlpr.ia.ac.cn
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code No The paper does not contain any explicit statement about releasing source code for the described methodology or a direct link to a code repository.
Open Datasets Yes We evaluate our method on three challenging environments: (1) a discrete version of the multiple-particle environment (MPE) [Wang et al., 2019]; (2) the Star Craft II micromanagement (SMAC) [Samvelyan et al., 2019]; and (3) the Google Research Football (GRF) [Kurach et al., 2020].
Dataset Splits No The paper states 'All experiments run with five random seeds.' and 'Experiment details are given in supplementary material.' but does not explicitly provide training/validation/test dataset splits or cross-validation methodology in the main text.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory specifications) used for running the experiments.
Software Dependencies No The paper does not list any specific software dependencies with version numbers (e.g., libraries, frameworks, or operating systems).
Experiment Setup Yes All experiments run with five random seeds. In GRF, all experiments follow the training settings of CDS [Chenghao et al., 2021], except that all experiments use TD(λ) to speed up training. To study the impact of the population size M on exploration, we train agents with different M under the reward-free setting.