reproducibilityindex.ai

Distributional Reward Estimation for Effective Multi-agent Deep Reinforcement Learning

Authors: Jifeng Hu, Yanchao Sun, Hechang Chen, Sili Huang, haiyin piao, Yi Chang, Lichao Sun

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The superiority of the DRE-MARL is demonstrated using benchmark multi-agent scenarios, compared with the SOTA baselines in terms of both effectiveness and robustness.
Researcher Affiliation	Academia	1,3,4,6School of Artificial Intelligence, Jlilin University, Changchun, China 2Department of Computer Science, University of Maryland, College Park, MD 20742, USA 5Northwestern Polytechnical University, Xian, China 7Lehigh University, Bethlehem, Pennsylvania, USA
Pseudocode	Yes	Further details and the full algorithm for optimizing DRE-MARL can be found in Appendix A.
Open Source Code	No	The paper does not provide an explicit statement or link for open-sourcing the code for the described methodology.
Open Datasets	Yes	To demonstrate the effectiveness of the proposed method, we provide experimental results in several benchmark MARL environments based on the multi-agent particle environments [22] (MPE) and several variants of MPE.
Dataset Splits	No	The paper mentions the use of multi-agent particle environments (MPE) but does not provide specific details on how the datasets were split into training, validation, and test sets. It only states, "we set the episode length as 25 for all experiments".
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory) used for running the experiments.
Software Dependencies	No	The paper does not provide specific software dependencies (e.g., library names with version numbers) needed to replicate the experiments.
Experiment Setup	Yes	For all methods we consider, we report the final performance under different reward settings (rdete,rdist,rac dist), which are introduced above. Following prior studies [13, 22], we set the episode length as 25 for all experiments and use the mean episode reward as the evaluation metric.