Deep Hierarchical Communication Graph in Multi-Agent Reinforcement Learning

Authors: Zeyang Liu, Lipeng Wan, Xue Sui, Zhuoran Chen, Kewu Sun, Xuguang Lan

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results show that our method improves performance across various cooperative multi-agent tasks, including Predator Prey, Multi-Agent Coordination Challenge, and Star Craft Multi-Agent Challenge. ... In this section, we conduct empirical experiments to answer the following questions: (1) Is Deep Hierarchical Communication Graph (DHCG) better than the existing MARL methods... (2) Can DHCG outperforms the pre-defined topologies or existing graph-based methods? (3) How does DHCG differ from communicationenabled algorithms? (4) Can DHCG generate different graphs to adapt to different situations?
Researcher Affiliation Academia Zeyang Liu1 , Lipeng Wan1 , Xue Sui1 , Zhuoran Chen1 , Kewu Sun2 and Xuguang Lan1 1National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, National Engineering Research Center for Visual Information and Applications, and Institute of Artificial Intelligence and Robotics, Xi an Jiaotong University 2Intelligent Science & Technology Academy {zeyang.liu, wanlipeng, suixue98, zhuoran.chen}@stu.xjtu.edu.cn, sun kewu@126.com, xglan@mail.xjtu.edu.cn
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide concrete access to source code for the methodology described.
Open Datasets Yes In this section, we compare the performance of MAPPO [Yu et al., 2022], HAPPO [Kuba et al., 2022], QMIX [Rashid et al., 2018], DCG [B ohmer et al., 2020], CASEC, SOP-CG [Yang et al., 2022], and DHCG on Predator Prey [Son et al., 2019], Multi-Agent Coordination Challenge (MACO) [Wang et al., 2022], and Star Craft Multi-Agent Challenge (SMAC) [Samvelyan et al., 2019].
Dataset Splits No The paper uses standard benchmark datasets but does not explicitly provide specific train/validation/test dataset splits (e.g., percentages or sample counts) needed for reproduction. It mentions running 'five independent runs with different random seeds'.
Hardware Specification No The paper does not explicitly describe the hardware used to run its experiments (e.g., specific GPU/CPU models, memory, or cloud instance types).
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes We use an ϵ-greedy exploration scheme, where ϵ decreases from 1 to 0.05 over 50 thousand timesteps in 10m vs 11m and MMM2, and over 1 million timesteps in corridor and 3s5z vs 3s6z.