Exploration via Joint Policy Diversity for Sparse-Reward Multi-Agent Tasks
Authors: Pei Xu, Junge Zhang, Kaiqi Huang
IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically evaluate the proposed method on three challenging environments: a discrete version of the multiple-particle environment (MPE) [Wang et al., 2019], the Google Research Football [Kurach et al., 2020] and Star Craft II micromanagement (SMAC) [Samvelyan et al., 2019]. In all experiments, we consider the sparse-reward setting. ... We show that our method significantly outperforms the stateof-the-art baselines on almost all tasks (RQ1 in Sec. 5.1). Fig. 1 shows a normalized sample size to achieve a success rate above 50% with respect to our method. ... To better understand the exploration behaviors of the proposed method, we present extensive experiments in the reward-free setting. Results show that our method, which is on top of a classical bonus-based method (i.e., count-based method), can explore significantly more states compared to the classical method (RQ2 in Sec. 5.2). |
| Researcher Affiliation | Academia | Pei Xu1,2 , Junge Zhang2 , Kaiqi Huang1,2,3 1School of Artificial Intelligence, University of Chinese Academy of Sciences 2CRISE, Institute of Automation, Chinese Academy of Sciences 3CAS, Center for Excellence in Brain Science and Intelligence Technology |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | No | The paper does not provide any statement or link indicating that the source code for the methodology is openly available. |
| Open Datasets | Yes | We empirically evaluate the proposed method on three challenging environments: a discrete version of the multiple-particle environment (MPE) [Wang et al., 2019], the Google Research Football [Kurach et al., 2020] and Star Craft II micromanagement (SMAC) [Samvelyan et al., 2019]. |
| Dataset Splits | No | The paper mentions 'training curves' and 'training settings' and that 'All experiments run with five random seeds', but does not explicitly provide specific percentages, sample counts, or detailed methodology for training/test/validation dataset splits. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for running experiments. |
| Software Dependencies | No | The paper mentions using specific frameworks or methods like QMIX and RND, but does not list any specific ancillary software dependencies with version numbers (e.g., Python 3.x, PyTorch x.x). |
| Experiment Setup | No | The paper states 'Details for environments and training are given in supplementary material.' While it mentions hyperparameters like w1, w2, and cu, and β, it does not provide their specific values or detailed training configurations within the main text. |