Distributional Reward Estimation for Effective Multi-agent Deep Reinforcement Learning
Authors: Jifeng Hu, Yanchao Sun, Hechang Chen, Sili Huang, haiyin piao, Yi Chang, Lichao Sun
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The superiority of the DRE-MARL is demonstrated using benchmark multi-agent scenarios, compared with the SOTA baselines in terms of both effectiveness and robustness. |
| Researcher Affiliation | Academia | 1,3,4,6School of Artificial Intelligence, Jlilin University, Changchun, China 2Department of Computer Science, University of Maryland, College Park, MD 20742, USA 5Northwestern Polytechnical University, Xian, China 7Lehigh University, Bethlehem, Pennsylvania, USA |
| Pseudocode | Yes | Further details and the full algorithm for optimizing DRE-MARL can be found in Appendix A. |
| Open Source Code | No | The paper does not provide an explicit statement or link for open-sourcing the code for the described methodology. |
| Open Datasets | Yes | To demonstrate the effectiveness of the proposed method, we provide experimental results in several benchmark MARL environments based on the multi-agent particle environments [22] (MPE) and several variants of MPE. |
| Dataset Splits | No | The paper mentions the use of multi-agent particle environments (MPE) but does not provide specific details on how the datasets were split into training, validation, and test sets. It only states, "we set the episode length as 25 for all experiments". |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory) used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies (e.g., library names with version numbers) needed to replicate the experiments. |
| Experiment Setup | Yes | For all methods we consider, we report the final performance under different reward settings (rdete,rdist,rac dist), which are introduced above. Following prior studies [13, 22], we set the episode length as 25 for all experiments and use the mean episode reward as the evaluation metric. |