reproducibilityindex.ai

Learning to Communicate with Deep Multi-Agent Reinforcement Learning

Authors: Jakob Foerster, Ioannis Alexandros Assael, Nando de Freitas, Shimon Whiteson

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments introduce new environments for studying the learning of communication protocols and present a set of engineering innovations that are essential for success in these domains. Experiments on two benchmark tasks, based on the MNIST dataset and a well known riddle, show, not only can these methods solve these tasks, they often discover elegant communication protocols along the way.
Researcher Affiliation	Collaboration	1University of Oxford, United Kingdom 2Canadian Institute for Advanced Research, CIFAR NCAP Program 3Google Deep Mind
Pseudocode	Yes	Further algorithmic details and pseudocode are in the supplementary material.
Open Source Code	Yes	Source code is available at: https://github.com/iassael/learning-to-communicate
Open Datasets	Yes	Experiments on two benchmark tasks, based on the MNIST dataset and a well known riddle... MNIST digit classiﬁcation dataset [25].
Dataset Splits	No	The paper describes training RL agents within environments and evaluating their performance. It does not provide explicit training/validation/test dataset splits in the conventional sense for a static dataset like MNIST (e.g., '80% training, 10% validation, 10% test').
Hardware Specification	No	The paper does not provide specific details about the hardware used for experiments, such as GPU or CPU models, memory, or cloud instance types.
Software Dependencies	No	The paper mentions software components and algorithms like 'RMSProp' and 'GRU' but does not provide specific version numbers for these or other software dependencies necessary for replication.
Experiment Setup	Yes	In our experiments, we use an ϵ-greedy policy with ϵ = 0.05, the discount factor is γ = 1, and the target network is reset every 100 episodes. To stabilise learning, we execute parallel episodes in batches of 32. The parameters are optimised using RMSProp [19] with a learning rate of 5 10 4. Unless stated otherwise, we set the standard deviation of noise added to the channel to σ = 2, which was found to be essential for good performance.