Deep Hierarchical Communication Graph in Multi-Agent Reinforcement Learning
Authors: Zeyang Liu, Lipeng Wan, Xue Sui, Zhuoran Chen, Kewu Sun, Xuguang Lan
IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results show that our method improves performance across various cooperative multi-agent tasks, including Predator Prey, Multi-Agent Coordination Challenge, and Star Craft Multi-Agent Challenge. ... In this section, we conduct empirical experiments to answer the following questions: (1) Is Deep Hierarchical Communication Graph (DHCG) better than the existing MARL methods... (2) Can DHCG outperforms the pre-defined topologies or existing graph-based methods? (3) How does DHCG differ from communicationenabled algorithms? (4) Can DHCG generate different graphs to adapt to different situations? |
| Researcher Affiliation | Academia | Zeyang Liu1 , Lipeng Wan1 , Xue Sui1 , Zhuoran Chen1 , Kewu Sun2 and Xuguang Lan1 1National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, National Engineering Research Center for Visual Information and Applications, and Institute of Artificial Intelligence and Robotics, Xi an Jiaotong University 2Intelligent Science & Technology Academy {zeyang.liu, wanlipeng, suixue98, zhuoran.chen}@stu.xjtu.edu.cn, sun kewu@126.com, xglan@mail.xjtu.edu.cn |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | Yes | In this section, we compare the performance of MAPPO [Yu et al., 2022], HAPPO [Kuba et al., 2022], QMIX [Rashid et al., 2018], DCG [B ohmer et al., 2020], CASEC, SOP-CG [Yang et al., 2022], and DHCG on Predator Prey [Son et al., 2019], Multi-Agent Coordination Challenge (MACO) [Wang et al., 2022], and Star Craft Multi-Agent Challenge (SMAC) [Samvelyan et al., 2019]. |
| Dataset Splits | No | The paper uses standard benchmark datasets but does not explicitly provide specific train/validation/test dataset splits (e.g., percentages or sample counts) needed for reproduction. It mentions running 'five independent runs with different random seeds'. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used to run its experiments (e.g., specific GPU/CPU models, memory, or cloud instance types). |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | We use an ϵ-greedy exploration scheme, where ϵ decreases from 1 to 0.05 over 50 thousand timesteps in 10m vs 11m and MMM2, and over 1 million timesteps in corridor and 3s5z vs 3s6z. |