Cooperative Exploration for Multi-Agent Deep Reinforcement Learning
Authors: Iou-Jen Liu, Unnat Jain, Raymond A Yeh, Alexander Schwing
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate CMAE on two challenging environments: (1) a discrete version of the multiple-particle environment (MPE) (Lowe et al., 2017; Wang et al., 2020); and (2) the Starcraft multi-agent challenge (SMAC) (Samvelyan et al., 2019). Table 1. Final metric of episode rewards of CMAE and baselines on sparse-reward (top) and dense-reward (bottom) MPE tasks. Figure 1. Training curves on sparse-reward and dense-reward MPE tasks. |
| Researcher Affiliation | Academia | 1University of Illinois at Urbana-Champaign, IL, U.S.A.. |
| Pseudocode | Yes | Algorithm 1: Training with Coordinated Multi-Agent Exploration (CMAE) Init: space tree Tspace, counters c Init: exploration policies µ = {µi}n i=1, target policies π = {πi}n i=1, replay buffer D; Algorithm 2: Train Exploration Policies (Train Exp) input : exploration policies µ = {µi}n i=1, shared goal g, replay buffer D; Algorithm 3: Select Restricted Space and Shared Goal (Select Restricted Space Goal) input : counters c, space tree Tspace, replay buffer D, episode output: selected goal g |
| Open Source Code | No | For more, please see our project page: https://ioujenliu. github.io/CMAE. This provides a project page link, but not an explicit statement of source code release or a direct link to a code repository. |
| Open Datasets | Yes | a discrete version of the multiple-particle environment (MPE) (Lowe et al., 2017; Wang et al., 2020) and the Starcraft multi-agent challenge (SMAC) (Samvelyan et al., 2019). |
| Dataset Splits | No | The paper mentions evaluating on an 'independent evaluation environment' and using 'evaluation episodes' but does not specify distinct training, validation, and test dataset splits with percentages or sample counts. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types, memory) used to run the experiments. |
| Software Dependencies | No | The paper mentions combining CMAE with Q-learning and QMIX, and using publicly available code for EITI, EDTI, weighted QMIX, but does not provide specific version numbers for any software or libraries. |
| Experiment Setup | No | The paper does not explicitly provide concrete hyperparameter values (e.g., learning rate, batch size, number of epochs) or specific system-level training configurations in the main text. |