Exploration via Joint Policy Diversity for Sparse-Reward Multi-Agent Tasks

Authors: Pei Xu, Junge Zhang, Kaiqi Huang

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically evaluate the proposed method on three challenging environments: a discrete version of the multiple-particle environment (MPE) [Wang et al., 2019], the Google Research Football [Kurach et al., 2020] and Star Craft II micromanagement (SMAC) [Samvelyan et al., 2019]. In all experiments, we consider the sparse-reward setting. ... We show that our method significantly outperforms the stateof-the-art baselines on almost all tasks (RQ1 in Sec. 5.1). Fig. 1 shows a normalized sample size to achieve a success rate above 50% with respect to our method. ... To better understand the exploration behaviors of the proposed method, we present extensive experiments in the reward-free setting. Results show that our method, which is on top of a classical bonus-based method (i.e., count-based method), can explore significantly more states compared to the classical method (RQ2 in Sec. 5.2).
Researcher Affiliation Academia Pei Xu1,2 , Junge Zhang2 , Kaiqi Huang1,2,3 1School of Artificial Intelligence, University of Chinese Academy of Sciences 2CRISE, Institute of Automation, Chinese Academy of Sciences 3CAS, Center for Excellence in Brain Science and Intelligence Technology
Pseudocode No No structured pseudocode or algorithm blocks were found in the paper.
Open Source Code No The paper does not provide any statement or link indicating that the source code for the methodology is openly available.
Open Datasets Yes We empirically evaluate the proposed method on three challenging environments: a discrete version of the multiple-particle environment (MPE) [Wang et al., 2019], the Google Research Football [Kurach et al., 2020] and Star Craft II micromanagement (SMAC) [Samvelyan et al., 2019].
Dataset Splits No The paper mentions 'training curves' and 'training settings' and that 'All experiments run with five random seeds', but does not explicitly provide specific percentages, sample counts, or detailed methodology for training/test/validation dataset splits.
Hardware Specification No The paper does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for running experiments.
Software Dependencies No The paper mentions using specific frameworks or methods like QMIX and RND, but does not list any specific ancillary software dependencies with version numbers (e.g., Python 3.x, PyTorch x.x).
Experiment Setup No The paper states 'Details for environments and training are given in supplementary material.' While it mentions hyperparameters like w1, w2, and cu, and β, it does not provide their specific values or detailed training configurations within the main text.