Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Cooperative Exploration for Multi-Agent Deep Reinforcement Learning
Authors: Iou-Jen Liu, Unnat Jain, Raymond A Yeh, Alexander Schwing
ICML 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate CMAE on two challenging environments: (1) a discrete version of the multiple-particle environment (MPE) (Lowe et al., 2017; Wang et al., 2020); and (2) the Starcraft multi-agent challenge (SMAC) (Samvelyan et al., 2019). Table 1. Final metric of episode rewards of CMAE and baselines on sparse-reward (top) and dense-reward (bottom) MPE tasks. Figure 1. Training curves on sparse-reward and dense-reward MPE tasks. |
| Researcher Affiliation | Academia | 1University of Illinois at Urbana-Champaign, IL, U.S.A.. |
| Pseudocode | Yes | Algorithm 1: Training with Coordinated Multi-Agent Exploration (CMAE) Init: space tree Tspace, counters c Init: exploration policies µ = {µi}n i=1, target policies π = {πi}n i=1, replay buffer D; Algorithm 2: Train Exploration Policies (Train Exp) input : exploration policies µ = {µi}n i=1, shared goal g, replay buffer D; Algorithm 3: Select Restricted Space and Shared Goal (Select Restricted Space Goal) input : counters c, space tree Tspace, replay buffer D, episode output: selected goal g |
| Open Source Code | No | For more, please see our project page: https://ioujenliu. github.io/CMAE. This provides a project page link, but not an explicit statement of source code release or a direct link to a code repository. |
| Open Datasets | Yes | a discrete version of the multiple-particle environment (MPE) (Lowe et al., 2017; Wang et al., 2020) and the Starcraft multi-agent challenge (SMAC) (Samvelyan et al., 2019). |
| Dataset Splits | No | The paper mentions evaluating on an 'independent evaluation environment' and using 'evaluation episodes' but does not specify distinct training, validation, and test dataset splits with percentages or sample counts. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types, memory) used to run the experiments. |
| Software Dependencies | No | The paper mentions combining CMAE with Q-learning and QMIX, and using publicly available code for EITI, EDTI, weighted QMIX, but does not provide specific version numbers for any software or libraries. |
| Experiment Setup | No | The paper does not explicitly provide concrete hyperparameter values (e.g., learning rate, batch size, number of epochs) or specific system-level training configurations in the main text. |