Biases for Emergent Communication in Multi-agent Reinforcement Learning
Authors: Tom Eccles, Yoram Bachrach, Guy Lever, Angeliki Lazaridou, Thore Graepel
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 Empirical Analysis We consider two environments. The first is a simple one-step environment, where agents must sum MNIST digits by communicating their value. ... The second environment is a new multi-step MARL environment which we name Treasure Hunt. |
| Researcher Affiliation | Industry | Tom Eccles Deep Mind London, UK eccles@google.com Yoram Bachrach Deep Mind London, UK yorambac@google.com Guy Lever Deep Mind London, UK guylever@google.com Angeliki Lazaridou Deep Mind London, UK angeliki@google.com Thore Graepel Deep Mind London, UK thore@google.com |
| Pseudocode | Yes | Algorithm 1 Calculation of positive signalling loss |
| Open Source Code | No | The paper does not include any explicit statements or links indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | 4.1 Summing MNIST digits In this task, depicted in Figure 1, the speaker and listener agents each observe a different MNIST digit (as an image), and must determine the sum of the digits. |
| Dataset Splits | No | The paper mentions training agents and uses "batch of rollouts" but does not explicitly provide specific train/validation/test dataset splits (e.g., percentages, sample counts, or predefined split references). |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware used (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) to run its experiments. |
| Software Dependencies | No | The paper mentions algorithms and methods like REINFORCE, Advantage Actor-Critic, V-trace, and RMSProp, but it does not provide specific version numbers for any software libraries or dependencies (e.g., Python, TensorFlow, PyTorch versions). |
| Experiment Setup | No | The full details of the Treasure Hunt environment, together with the hyperparameters used in our agents, can be found in the supplementary material. This indicates the details are not in the main text. |