reproducibilityindex.ai

MDPGT: Momentum-Based Decentralized Policy Gradient Tracking

Authors: Zhanhong Jiang, Xian Yeow Lee, Sin Yong Tan, Kai Liang Tan, Aditya Balu, Young M Lee, Chinmay Hegde, Soumik Sarkar9377-9385

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Lastly, we provide empirical results on a multi-agent reinforcement learning benchmark environment to support our theoretical ﬁndings.
Researcher Affiliation	Collaboration	1Johnson Controls Inc., 507 East Michigan St, Milwaukee, WI 53202, 2Iowa State University, Ames, IA 50010, 3New York University, 6 Metro Tech Center, Brooklyn, NY 11201
Pseudocode	Yes	Algorithm 1: MDPGT
Open Source Code	Yes	Codes to reproduce results are also available at:https://github.com/xylee95/MD-PGT
Open Datasets	Yes	we performed experiments on a cooperative navigation multi-agent environment that has been commonly used as a benchmark in several previous works (Qu et al. 2019; Zhang et al. 2018; Lu et al. 2021). Our platform for cooperative navigation is derived off the particle environment introduced by (Lowe et al. 2017).
Dataset Splits	No	The paper mentions training for '50k episodes with a horizon of 50 steps and discount factor of 0.99' and evaluation based on 'average training rewards' but does not specify explicit training, validation, and test dataset splits or percentages.
Hardware Specification	No	The paper states that 'computing infrastructure details are available in the supplementary materials', but no specific hardware specifications (like GPU models, CPU types, or memory) are provided in the main text of the paper.
Software Dependencies	No	The paper describes the policy as a '3-layer dense neural network' and refers to 'Python' in the context of code availability, but it does not specify any software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup	Yes	The agents were trained for 50k episodes with a horizon of 50 steps and discount factor of 0.99. All agent s policy is represented by a 3-layer dense neural network with 64 hidden units and tanh activation functions.