Episodic Multi-agent Reinforcement Learning with Curiosity-driven Exploration

Authors: Lulu Zheng, Jiarui Chen, Jianhao Wang, Jiamin He, Yujing Hu, Yingfeng Chen, Changjie Fan, Yang Gao, Chongjie Zhang

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate EMC in didactic examples, and a broad set of Star Craft II micromanagement benchmark tasks [8]. The didactic examples along with detailed visualization illustrate that our proposed intrinsic reward can guide agents policies to novel or promising states, thus enabling effectively coordinated exploration. Empirical results on more complicated Star Craft II tasks show that EMC significantly outperforms other multi-agent state-of-the-art baselines.
Researcher Affiliation Collaboration Lulu Zheng 1, Jiarui Chen 2 3, Jianhao Wang1, Jiamin He4 , Yujing Hu3, Yingfeng Chen3, Changjie Fan3, Yang Gao2, Chongjie Zhang1 1Institute for Interdisciplinary Information Sciences, Tsinghua University, China 2Department of Computer Science and Technology, Nanjing University, China 3Fuxi AI Lab, Net Ease, China 4Department of Computing Science, University of Alberta, Canada
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide a specific link or explicit statement about the release of source code for the described methodology.
Open Datasets Yes We conduct experiments in 17 benchmark tasks of Star Craft II, which contains 14 popular tasks proposed by SMAC [8] and three more super hard cooperative tasks proposed by QPLEX [7].
Dataset Splits No The paper does not provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) for training, validation, and test sets.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup Yes The two agents will receive a positive global reward r = 10 if and only if they arrive at the corresponding goal grid (referred to the character G in Figure 3) at the same time. If only one arrives, the incoordination will be punished by a negative reward p.