reproducibilityindex.ai

Exploration via Joint Policy Diversity for Sparse-Reward Multi-Agent Tasks

Authors: Pei Xu, Junge Zhang, Kaiqi Huang

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically evaluate the proposed method on three challenging environments: a discrete version of the multiple-particle environment (MPE) [Wang et al., 2019], the Google Research Football [Kurach et al., 2020] and Star Craft II micromanagement (SMAC) [Samvelyan et al., 2019]. In all experiments, we consider the sparse-reward setting. ... We show that our method signiﬁcantly outperforms the stateof-the-art baselines on almost all tasks (RQ1 in Sec. 5.1). Fig. 1 shows a normalized sample size to achieve a success rate above 50% with respect to our method. ... To better understand the exploration behaviors of the proposed method, we present extensive experiments in the reward-free setting. Results show that our method, which is on top of a classical bonus-based method (i.e., count-based method), can explore signiﬁcantly more states compared to the classical method (RQ2 in Sec. 5.2).
Researcher Affiliation	Academia	Pei Xu1,2 , Junge Zhang2 , Kaiqi Huang1,2,3 1School of Artiﬁcial Intelligence, University of Chinese Academy of Sciences 2CRISE, Institute of Automation, Chinese Academy of Sciences 3CAS, Center for Excellence in Brain Science and Intelligence Technology
Pseudocode	No	No structured pseudocode or algorithm blocks were found in the paper.
Open Source Code	No	The paper does not provide any statement or link indicating that the source code for the methodology is openly available.
Open Datasets	Yes	We empirically evaluate the proposed method on three challenging environments: a discrete version of the multiple-particle environment (MPE) [Wang et al., 2019], the Google Research Football [Kurach et al., 2020] and Star Craft II micromanagement (SMAC) [Samvelyan et al., 2019].
Dataset Splits	No	The paper mentions 'training curves' and 'training settings' and that 'All experiments run with ﬁve random seeds', but does not explicitly provide specific percentages, sample counts, or detailed methodology for training/test/validation dataset splits.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for running experiments.
Software Dependencies	No	The paper mentions using specific frameworks or methods like QMIX and RND, but does not list any specific ancillary software dependencies with version numbers (e.g., Python 3.x, PyTorch x.x).
Experiment Setup	No	The paper states 'Details for environments and training are given in supplementary material.' While it mentions hyperparameters like w1, w2, and cu, and β, it does not provide their specific values or detailed training configurations within the main text.