Cooperative Exploration for Multi-Agent Deep Reinforcement Learning

Authors: Iou-Jen Liu, Unnat Jain, Raymond A Yeh, Alexander Schwing

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate CMAE on two challenging environments: (1) a discrete version of the multiple-particle environment (MPE) (Lowe et al., 2017; Wang et al., 2020); and (2) the Starcraft multi-agent challenge (SMAC) (Samvelyan et al., 2019). Table 1. Final metric of episode rewards of CMAE and baselines on sparse-reward (top) and dense-reward (bottom) MPE tasks. Figure 1. Training curves on sparse-reward and dense-reward MPE tasks.
Researcher Affiliation Academia 1University of Illinois at Urbana-Champaign, IL, U.S.A..
Pseudocode Yes Algorithm 1: Training with Coordinated Multi-Agent Exploration (CMAE) Init: space tree Tspace, counters c Init: exploration policies µ = {µi}n i=1, target policies π = {πi}n i=1, replay buffer D; Algorithm 2: Train Exploration Policies (Train Exp) input : exploration policies µ = {µi}n i=1, shared goal g, replay buffer D; Algorithm 3: Select Restricted Space and Shared Goal (Select Restricted Space Goal) input : counters c, space tree Tspace, replay buffer D, episode output: selected goal g
Open Source Code No For more, please see our project page: https://ioujenliu. github.io/CMAE. This provides a project page link, but not an explicit statement of source code release or a direct link to a code repository.
Open Datasets Yes a discrete version of the multiple-particle environment (MPE) (Lowe et al., 2017; Wang et al., 2020) and the Starcraft multi-agent challenge (SMAC) (Samvelyan et al., 2019).
Dataset Splits No The paper mentions evaluating on an 'independent evaluation environment' and using 'evaluation episodes' but does not specify distinct training, validation, and test dataset splits with percentages or sample counts.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types, memory) used to run the experiments.
Software Dependencies No The paper mentions combining CMAE with Q-learning and QMIX, and using publicly available code for EITI, EDTI, weighted QMIX, but does not provide specific version numbers for any software or libraries.
Experiment Setup No The paper does not explicitly provide concrete hyperparameter values (e.g., learning rate, batch size, number of epochs) or specific system-level training configurations in the main text.