ToM2C: Target-oriented Multi-agent Communication and Cooperation with Theory of Mind
Authors: Yuanfei Wang, fangwei zhong, Jing Xu, Yizhou Wang
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experiments are conducted in two environments. First, in the cooperative navigation scenario (Lowe et al., 2017), the team goal is to occupy landmarks (static targets) and avoid collision. Then we evaluate our method in a more complex scenario, multi-sensor multi-target covering scenario (Xu et al., 2020). The results show that our method achieves the best performance (the highest reward and the lowest communication cost) among the state-of-the-art MARL methods, e.g., Hi T-MAC (Xu et al., 2020), I2C (Ding et al., 2020), MAPPO (Yu et al., 2021) and Tar MAC (Das et al., 2019). Moreover, we further show the good scalability of To M2C and conduct an ablation study to evaluate the contribution of each key component in To M2C. |
| Researcher Affiliation | Academia | 1 Center for Data Science, Peking University 2 School of Artificial Intelligence, Peking University 3 Center on Frontiers of Computing Studies, School of Computer Science, Peking University 4 Adv. Inst. of Info. Tech, Peking University 5 Beijing Institute for General Artificial Intelligence (BIGAI) { yuanfei_wang, zfw, jing.xu, yizhou.wang }@pku.edu.cn |
| Pseudocode | No | The paper describes the system architecture and its components in detail, including mathematical formulations for learning (e.g., equations 1-6) and a diagram (Figure 2). However, it does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is available at https://github.com/Unreal Tracking/To M2C. |
| Open Datasets | No | The experiments are conducted in two custom-built environments: cooperative navigation (Lowe et al., 2017) and multi-sensor multi-target coverage (Xu et al., 2020). While the environments are described and based on prior work, the paper does not provide concrete access information (link, DOI, specific repository) for publicly available datasets used for training or evaluation. The environments are simulators. |
| Dataset Splits | No | The paper describes the training strategy and parameters, including episode length and discount factor adjustments over time (Appendix B). However, it does not specify explicit training, validation, and test dataset splits (e.g., percentages, sample counts, or predefined partition files) for the data generated by the simulation environments. |
| Hardware Specification | Yes | The environment and model are implemented in Python. The model is built on Py Torch and is trained on a machine with 7 Nvidia GPUs (Titan Xp) and 72 Intel CPU Cores. |
| Software Dependencies | No | The paper states that the environment and model are implemented in 'Python' and the model is built on 'Py Torch'. However, it does not provide specific version numbers for these or any other software dependencies, which are required for full reproducibility. |
| Experiment Setup | Yes | The paper provides detailed experimental setup information in Appendix B, including hyper-parameters listed in Table 2 such as GRU hidden units (32), att1 hidden units (64), att2 hidden units (192), max steps (3M), episode length (100), discount factor (0.9), entropy weight (0.005), learning rate (1e-3), workers (6), update frequency (20), To M Frozen (5), and gamma rate (0.002). |