Learning to Communicate with Deep Multi-Agent Reinforcement Learning

Authors: Jakob Foerster, Ioannis Alexandros Assael, Nando de Freitas, Shimon Whiteson

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments introduce new environments for studying the learning of communication protocols and present a set of engineering innovations that are essential for success in these domains. Experiments on two benchmark tasks, based on the MNIST dataset and a well known riddle, show, not only can these methods solve these tasks, they often discover elegant communication protocols along the way.
Researcher Affiliation Collaboration 1University of Oxford, United Kingdom 2Canadian Institute for Advanced Research, CIFAR NCAP Program 3Google Deep Mind
Pseudocode Yes Further algorithmic details and pseudocode are in the supplementary material.
Open Source Code Yes Source code is available at: https://github.com/iassael/learning-to-communicate
Open Datasets Yes Experiments on two benchmark tasks, based on the MNIST dataset and a well known riddle... MNIST digit classification dataset [25].
Dataset Splits No The paper describes training RL agents within environments and evaluating their performance. It does not provide explicit training/validation/test dataset splits in the conventional sense for a static dataset like MNIST (e.g., '80% training, 10% validation, 10% test').
Hardware Specification No The paper does not provide specific details about the hardware used for experiments, such as GPU or CPU models, memory, or cloud instance types.
Software Dependencies No The paper mentions software components and algorithms like 'RMSProp' and 'GRU' but does not provide specific version numbers for these or other software dependencies necessary for replication.
Experiment Setup Yes In our experiments, we use an ϵ-greedy policy with ϵ = 0.05, the discount factor is γ = 1, and the target network is reset every 100 episodes. To stabilise learning, we execute parallel episodes in batches of 32. The parameters are optimised using RMSProp [19] with a learning rate of 5 10 4. Unless stated otherwise, we set the standard deviation of noise added to the channel to σ = 2, which was found to be essential for good performance.