Universally Expressive Communication in Multi-Agent Reinforcement Learning
Authors: Matthew Morris, Thomas D Barrett, Arnu Pretorius
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, these augmentations are found to improve performance on tasks where expressive communication is required, whilst, in general, the optimal communication protocol is found to be task-dependent. |
| Researcher Affiliation | Collaboration | Matthew Morris Insta Deep Ltd. & University of Oxford matthew.morris@cs.ox.ac.uk Thomas D. Barrett Insta Deep Ltd. t.barrett@instadeep.com Arnu Pretorius Insta Deep Ltd. a.pretorius@instadeep.com |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | Yes | Code, environments, and instructions for reproducibility are included in the supplemental material. |
| Open Datasets | Yes | Predator-Prey [9, 27, 29, 37, 49] and Traffic Junction [9, 27, 29, 37, 49, 51] are common MARL communication benchmarks. ... We also introduce two new environments, Drone Scatter and Box Pushing, to respectively test symmetry-breaking and communication expressivity beyond 1-WL. ... New environments and code provided in the supplemental material. |
| Dataset Splits | Yes | Each epoch consists of 5000 training episodes, after which 100 evaluation episodes are used to report aggregate metric scores, yielding an evaluation score for the model after every epoch. |
| Hardware Specification | Yes | All experiments were run on a single machine equipped with an Intel Core i9-9900K CPU, 64GB of RAM, and an NVIDIA GeForce RTX 3090 GPU. |
| Software Dependencies | Yes | Our code base is written in Python 3.8. We use PyTorch 1.10.1 as our deep learning framework and OpenAI Gym 0.21.0 for the environment interface. |
| Experiment Setup | Yes | Full experiment and hyperparameter details can be found in Appendix C, and full results are shown in Appendix D. ... For each scenario and for every baseline communication method, we compare 4 models: the baseline without modifications, the baseline augmented with unique IDs for each agent, the baseline augmented with 0.75 RNI, and finally 0.25 RNI. |