Distributional Reward Estimation for Effective Multi-agent Deep Reinforcement Learning

Authors: Jifeng Hu, Yanchao Sun, Hechang Chen, Sili Huang, haiyin piao, Yi Chang, Lichao Sun

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The superiority of the DRE-MARL is demonstrated using benchmark multi-agent scenarios, compared with the SOTA baselines in terms of both effectiveness and robustness.
Researcher Affiliation Academia 1,3,4,6School of Artificial Intelligence, Jlilin University, Changchun, China 2Department of Computer Science, University of Maryland, College Park, MD 20742, USA 5Northwestern Polytechnical University, Xian, China 7Lehigh University, Bethlehem, Pennsylvania, USA
Pseudocode Yes Further details and the full algorithm for optimizing DRE-MARL can be found in Appendix A.
Open Source Code No The paper does not provide an explicit statement or link for open-sourcing the code for the described methodology.
Open Datasets Yes To demonstrate the effectiveness of the proposed method, we provide experimental results in several benchmark MARL environments based on the multi-agent particle environments [22] (MPE) and several variants of MPE.
Dataset Splits No The paper mentions the use of multi-agent particle environments (MPE) but does not provide specific details on how the datasets were split into training, validation, and test sets. It only states, "we set the episode length as 25 for all experiments".
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory) used for running the experiments.
Software Dependencies No The paper does not provide specific software dependencies (e.g., library names with version numbers) needed to replicate the experiments.
Experiment Setup Yes For all methods we consider, we report the final performance under different reward settings (rdete,rdist,rac dist), which are introduced above. Following prior studies [13, 22], we set the episode length as 25 for all experiments and use the mean episode reward as the evaluation metric.