Population-Based Diverse Exploration for Sparse-Reward Multi-Agent Tasks
Authors: Pei Xu, Junge Zhang, Kaiqi Huang
IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically evaluate the proposed method on three challenging environments: a discrete version of the multiple-particle environment (MPE) [Wang et al., 2019], the Google Research Football [Kurach et al., 2020] and Star Craft II micromanagement (SMAC) [Samvelyan et al., 2019]. In all experiments, we consider the sparse-reward setting. We show that our method significantly outperforms the state-of-the-art baselines on almost all tasks (RQ1 in Sec. 4.1). |
| Researcher Affiliation | Academia | Pei Xu1 , Junge Zhang1, , Kaiqi Huang1,2, 1CRISE, Institute of Automation, Chinese Academy of Sciences 2CAS, Center for Excellence in Brain Science and Intelligence Technology pei.xu@ia.ac.cn, {jgzhang,kqhuang}@nlpr.ia.ac.cn |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code for the described methodology or a direct link to a code repository. |
| Open Datasets | Yes | We evaluate our method on three challenging environments: (1) a discrete version of the multiple-particle environment (MPE) [Wang et al., 2019]; (2) the Star Craft II micromanagement (SMAC) [Samvelyan et al., 2019]; and (3) the Google Research Football (GRF) [Kurach et al., 2020]. |
| Dataset Splits | No | The paper states 'All experiments run with five random seeds.' and 'Experiment details are given in supplementary material.' but does not explicitly provide training/validation/test dataset splits or cross-validation methodology in the main text. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory specifications) used for running the experiments. |
| Software Dependencies | No | The paper does not list any specific software dependencies with version numbers (e.g., libraries, frameworks, or operating systems). |
| Experiment Setup | Yes | All experiments run with five random seeds. In GRF, all experiments follow the training settings of CDS [Chenghao et al., 2021], except that all experiments use TD(λ) to speed up training. To study the impact of the population size M on exploration, we train agents with different M under the reward-free setting. |