MDPGT: Momentum-Based Decentralized Policy Gradient Tracking

Authors: Zhanhong Jiang, Xian Yeow Lee, Sin Yong Tan, Kai Liang Tan, Aditya Balu, Young M Lee, Chinmay Hegde, Soumik Sarkar9377-9385

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Lastly, we provide empirical results on a multi-agent reinforcement learning benchmark environment to support our theoretical findings.
Researcher Affiliation Collaboration 1Johnson Controls Inc., 507 East Michigan St, Milwaukee, WI 53202, 2Iowa State University, Ames, IA 50010, 3New York University, 6 Metro Tech Center, Brooklyn, NY 11201
Pseudocode Yes Algorithm 1: MDPGT
Open Source Code Yes Codes to reproduce results are also available at:https://github.com/xylee95/MD-PGT
Open Datasets Yes we performed experiments on a cooperative navigation multi-agent environment that has been commonly used as a benchmark in several previous works (Qu et al. 2019; Zhang et al. 2018; Lu et al. 2021). Our platform for cooperative navigation is derived off the particle environment introduced by (Lowe et al. 2017).
Dataset Splits No The paper mentions training for '50k episodes with a horizon of 50 steps and discount factor of 0.99' and evaluation based on 'average training rewards' but does not specify explicit training, validation, and test dataset splits or percentages.
Hardware Specification No The paper states that 'computing infrastructure details are available in the supplementary materials', but no specific hardware specifications (like GPU models, CPU types, or memory) are provided in the main text of the paper.
Software Dependencies No The paper describes the policy as a '3-layer dense neural network' and refers to 'Python' in the context of code availability, but it does not specify any software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup Yes The agents were trained for 50k episodes with a horizon of 50 steps and discount factor of 0.99. All agent s policy is represented by a 3-layer dense neural network with 64 hidden units and tanh activation functions.