reproducibilityindex.ai

Cooperative Exploration for Multi-Agent Deep Reinforcement Learning

Authors: Iou-Jen Liu, Unnat Jain, Raymond A Yeh, Alexander Schwing

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate CMAE on two challenging environments: (1) a discrete version of the multiple-particle environment (MPE) (Lowe et al., 2017; Wang et al., 2020); and (2) the Starcraft multi-agent challenge (SMAC) (Samvelyan et al., 2019). Table 1. Final metric of episode rewards of CMAE and baselines on sparse-reward (top) and dense-reward (bottom) MPE tasks. Figure 1. Training curves on sparse-reward and dense-reward MPE tasks.
Researcher Affiliation	Academia	1University of Illinois at Urbana-Champaign, IL, U.S.A..
Pseudocode	Yes	Algorithm 1: Training with Coordinated Multi-Agent Exploration (CMAE) Init: space tree Tspace, counters c Init: exploration policies µ = {µi}n i=1, target policies π = {πi}n i=1, replay buffer D; Algorithm 2: Train Exploration Policies (Train Exp) input : exploration policies µ = {µi}n i=1, shared goal g, replay buffer D; Algorithm 3: Select Restricted Space and Shared Goal (Select Restricted Space Goal) input : counters c, space tree Tspace, replay buffer D, episode output: selected goal g
Open Source Code	No	For more, please see our project page: https://ioujenliu. github.io/CMAE. This provides a project page link, but not an explicit statement of source code release or a direct link to a code repository.
Open Datasets	Yes	a discrete version of the multiple-particle environment (MPE) (Lowe et al., 2017; Wang et al., 2020) and the Starcraft multi-agent challenge (SMAC) (Samvelyan et al., 2019).
Dataset Splits	No	The paper mentions evaluating on an 'independent evaluation environment' and using 'evaluation episodes' but does not specify distinct training, validation, and test dataset splits with percentages or sample counts.
Hardware Specification	No	The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types, memory) used to run the experiments.
Software Dependencies	No	The paper mentions combining CMAE with Q-learning and QMIX, and using publicly available code for EITI, EDTI, weighted QMIX, but does not provide specific version numbers for any software or libraries.
Experiment Setup	No	The paper does not explicitly provide concrete hyperparameter values (e.g., learning rate, batch size, number of epochs) or specific system-level training configurations in the main text.