DM²: Decentralized Multi-Agent Reinforcement Learning via Distribution Matching
Authors: Caroline Wang, Ishan Durugkar, Elad Liebman, Peter Stone
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental validation on the Star Craft domain shows that combining (1) a task reward, and (2) a distribution matching reward for expert demonstrations for the same task, allows agents to outperform a naive distributed baseline. Additional experiments probe the conditions under which expert demonstrations need to be sampled to obtain the learning benefits. |
| Researcher Affiliation | Collaboration | 1 The University of Texas at Austin 2 Spark Cognition Research 3 Sony AI caroline.l.wang@utexas.edu, ishand@cs.utexas.edu, eliebman@sparkcognition.com, pstone@cs.utexas.edu |
| Pseudocode | Yes | Algorithm 1: DM2 (Decentralized MARL via distribution matching) |
| Open Source Code | Yes | The code is provided at https://github.com/carolinewang01/dm2. |
| Open Datasets | Yes | Experiments were conducted on the Star Craft Multi-Agent Challenge domain (Samvelyan et al. 2019). |
| Dataset Splits | No | The paper mentions evaluating on test episodes and using demonstration data, but does not specify explicit training/validation/test dataset splits (e.g., percentages or counts) for their main learning process. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models, processor types, or memory specifications used for running its experiments. |
| Software Dependencies | No | The paper mentions using PPO, QMIX, RMAPPO, and GAIL, but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | Experimental details such as hyperparameters are specified in Appendix C. |