Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Learning to Communicate with Deep Multi-Agent Reinforcement Learning
Authors: Jakob Foerster, Ioannis Alexandros Assael, Nando de Freitas, Shimon Whiteson
NeurIPS 2016 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments introduce new environments for studying the learning of communication protocols and present a set of engineering innovations that are essential for success in these domains. Experiments on two benchmark tasks, based on the MNIST dataset and a well known riddle, show, not only can these methods solve these tasks, they often discover elegant communication protocols along the way. |
| Researcher Affiliation | Collaboration | 1University of Oxford, United Kingdom 2Canadian Institute for Advanced Research, CIFAR NCAP Program 3Google Deep Mind |
| Pseudocode | Yes | Further algorithmic details and pseudocode are in the supplementary material. |
| Open Source Code | Yes | Source code is available at: https://github.com/iassael/learning-to-communicate |
| Open Datasets | Yes | Experiments on two benchmark tasks, based on the MNIST dataset and a well known riddle... MNIST digit classification dataset [25]. |
| Dataset Splits | No | The paper describes training RL agents within environments and evaluating their performance. It does not provide explicit training/validation/test dataset splits in the conventional sense for a static dataset like MNIST (e.g., '80% training, 10% validation, 10% test'). |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for experiments, such as GPU or CPU models, memory, or cloud instance types. |
| Software Dependencies | No | The paper mentions software components and algorithms like 'RMSProp' and 'GRU' but does not provide specific version numbers for these or other software dependencies necessary for replication. |
| Experiment Setup | Yes | In our experiments, we use an ϵ-greedy policy with ϵ = 0.05, the discount factor is γ = 1, and the target network is reset every 100 episodes. To stabilise learning, we execute parallel episodes in batches of 32. The parameters are optimised using RMSProp [19] with a learning rate of 5 10 4. Unless stated otherwise, we set the standard deviation of noise added to the channel to σ = 2, which was found to be essential for good performance. |