Hierarchical Mean-Field Deep Reinforcement Learning for Large-Scale Multiagent Systems
Authors: Chao Yu
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical studies show that HMF significantly outperforms existing baselines on both challenging cooperative and mixed cooperative-competitive tasks with different scales of agent populations. and In this section, we first assess HMF on the Multi-Agent Particle Environments (MPE) (Mordatch and Abbeel 2018), i.e., the Spread and the Pursuit Evasion, in order to evaluate the performance in relatively small-scale agent systems. |
| Researcher Affiliation | Academia | Chao Yu1, 2 1 School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China 2 Pengcheng Laboratory, Shenzhen, China yuchao3@mail.sysu.edu.cn |
| Pseudocode | Yes | Algorithm 1: The HMF Learning Algorithm |
| Open Source Code | No | The paper does not contain an explicit statement about releasing code or a link to a code repository. |
| Open Datasets | Yes | MPE In the Spread task in Figure 2a, 6 agents have to cover a set of 6 landmarks while avoiding colliding with each other. ... We assess the performance of our method by comparing it against some benchmark algorithms on cooperative tasks including the particle environments (Mordatch and Abbeel 2018) and the MAgent environment (Zheng et al. 2018), as well as a mixed cooperative-competitive task in the social dilemma Bar Game (Arthur 1994). |
| Dataset Splits | No | The paper discusses learning performance over episodes and averages results over random seeds, but it does not specify explicit train/validation/test dataset splits with percentages or sample counts for the environments used. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions various environments and algorithms used (e.g., MPE, MAgent, MADDPG, DQN) but does not provide specific version numbers for any software dependencies or libraries. |
| Experiment Setup | Yes | Tables 1 and 2 in the Appendix respectively show the hyper-parameter setting of our methods and the other baselines. |