Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
DM²: Decentralized Multi-Agent Reinforcement Learning via Distribution Matching
Authors: Caroline Wang, Ishan Durugkar, Elad Liebman, Peter Stone
AAAI 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental validation on the Star Craft domain shows that combining (1) a task reward, and (2) a distribution matching reward for expert demonstrations for the same task, allows agents to outperform a naive distributed baseline. Additional experiments probe the conditions under which expert demonstrations need to be sampled to obtain the learning benefits. |
| Researcher Affiliation | Collaboration | 1 The University of Texas at Austin 2 Spark Cognition Research 3 Sony AI EMAIL, EMAIL, EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1: DM2 (Decentralized MARL via distribution matching) |
| Open Source Code | Yes | The code is provided at https://github.com/carolinewang01/dm2. |
| Open Datasets | Yes | Experiments were conducted on the Star Craft Multi-Agent Challenge domain (Samvelyan et al. 2019). |
| Dataset Splits | No | The paper mentions evaluating on test episodes and using demonstration data, but does not specify explicit training/validation/test dataset splits (e.g., percentages or counts) for their main learning process. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models, processor types, or memory specifications used for running its experiments. |
| Software Dependencies | No | The paper mentions using PPO, QMIX, RMAPPO, and GAIL, but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | Experimental details such as hyperparameters are specified in Appendix C. |