Episodic Multi-agent Reinforcement Learning with Curiosity-driven Exploration
Authors: Lulu Zheng, Jiarui Chen, Jianhao Wang, Jiamin He, Yujing Hu, Yingfeng Chen, Changjie Fan, Yang Gao, Chongjie Zhang
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate EMC in didactic examples, and a broad set of Star Craft II micromanagement benchmark tasks [8]. The didactic examples along with detailed visualization illustrate that our proposed intrinsic reward can guide agents policies to novel or promising states, thus enabling effectively coordinated exploration. Empirical results on more complicated Star Craft II tasks show that EMC significantly outperforms other multi-agent state-of-the-art baselines. |
| Researcher Affiliation | Collaboration | Lulu Zheng 1, Jiarui Chen 2 3, Jianhao Wang1, Jiamin He4 , Yujing Hu3, Yingfeng Chen3, Changjie Fan3, Yang Gao2, Chongjie Zhang1 1Institute for Interdisciplinary Information Sciences, Tsinghua University, China 2Department of Computer Science and Technology, Nanjing University, China 3Fuxi AI Lab, Net Ease, China 4Department of Computing Science, University of Alberta, Canada |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide a specific link or explicit statement about the release of source code for the described methodology. |
| Open Datasets | Yes | We conduct experiments in 17 benchmark tasks of Star Craft II, which contains 14 popular tasks proposed by SMAC [8] and three more super hard cooperative tasks proposed by QPLEX [7]. |
| Dataset Splits | No | The paper does not provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) for training, validation, and test sets. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | Yes | The two agents will receive a positive global reward r = 10 if and only if they arrive at the corresponding goal grid (referred to the character G in Figure 3) at the same time. If only one arrives, the incoordination will be punished by a negative reward p. |