Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Episodic Multi-agent Reinforcement Learning with Curiosity-driven Exploration
Authors: Lulu Zheng, Jiarui Chen, Jianhao Wang, Jiamin He, Yujing Hu, Yingfeng Chen, Changjie Fan, Yang Gao, Chongjie Zhang
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate EMC in didactic examples, and a broad set of Star Craft II micromanagement benchmark tasks [8]. The didactic examples along with detailed visualization illustrate that our proposed intrinsic reward can guide agents policies to novel or promising states, thus enabling effectively coordinated exploration. Empirical results on more complicated Star Craft II tasks show that EMC significantly outperforms other multi-agent state-of-the-art baselines. |
| Researcher Affiliation | Collaboration | Lulu Zheng 1, Jiarui Chen 2 3, Jianhao Wang1, Jiamin He4 , Yujing Hu3, Yingfeng Chen3, Changjie Fan3, Yang Gao2, Chongjie Zhang1 1Institute for Interdisciplinary Information Sciences, Tsinghua University, China 2Department of Computer Science and Technology, Nanjing University, China 3Fuxi AI Lab, Net Ease, China 4Department of Computing Science, University of Alberta, Canada |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide a specific link or explicit statement about the release of source code for the described methodology. |
| Open Datasets | Yes | We conduct experiments in 17 benchmark tasks of Star Craft II, which contains 14 popular tasks proposed by SMAC [8] and three more super hard cooperative tasks proposed by QPLEX [7]. |
| Dataset Splits | No | The paper does not provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) for training, validation, and test sets. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | Yes | The two agents will receive a positive global reward r = 10 if and only if they arrive at the corresponding goal grid (referred to the character G in Figure 3) at the same time. If only one arrives, the incoordination will be punished by a negative reward p. |