Universally Expressive Communication in Multi-Agent Reinforcement Learning

Authors: Matthew Morris, Thomas D Barrett, Arnu Pretorius

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, these augmentations are found to improve performance on tasks where expressive communication is required, whilst, in general, the optimal communication protocol is found to be task-dependent.
Researcher Affiliation Collaboration Matthew Morris Insta Deep Ltd. & University of Oxford matthew.morris@cs.ox.ac.uk Thomas D. Barrett Insta Deep Ltd. t.barrett@instadeep.com Arnu Pretorius Insta Deep Ltd. a.pretorius@instadeep.com
Pseudocode No No structured pseudocode or algorithm blocks were found in the paper.
Open Source Code Yes Code, environments, and instructions for reproducibility are included in the supplemental material.
Open Datasets Yes Predator-Prey [9, 27, 29, 37, 49] and Traffic Junction [9, 27, 29, 37, 49, 51] are common MARL communication benchmarks. ... We also introduce two new environments, Drone Scatter and Box Pushing, to respectively test symmetry-breaking and communication expressivity beyond 1-WL. ... New environments and code provided in the supplemental material.
Dataset Splits Yes Each epoch consists of 5000 training episodes, after which 100 evaluation episodes are used to report aggregate metric scores, yielding an evaluation score for the model after every epoch.
Hardware Specification Yes All experiments were run on a single machine equipped with an Intel Core i9-9900K CPU, 64GB of RAM, and an NVIDIA GeForce RTX 3090 GPU.
Software Dependencies Yes Our code base is written in Python 3.8. We use PyTorch 1.10.1 as our deep learning framework and OpenAI Gym 0.21.0 for the environment interface.
Experiment Setup Yes Full experiment and hyperparameter details can be found in Appendix C, and full results are shown in Appendix D. ... For each scenario and for every baseline communication method, we compare 4 models: the baseline without modifications, the baseline augmented with unique IDs for each agent, the baseline augmented with 0.75 RNI, and finally 0.25 RNI.