Biases for Emergent Communication in Multi-agent Reinforcement Learning

Authors: Tom Eccles, Yoram Bachrach, Guy Lever, Angeliki Lazaridou, Thore Graepel

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 Empirical Analysis We consider two environments. The first is a simple one-step environment, where agents must sum MNIST digits by communicating their value. ... The second environment is a new multi-step MARL environment which we name Treasure Hunt.
Researcher Affiliation Industry Tom Eccles Deep Mind London, UK eccles@google.com Yoram Bachrach Deep Mind London, UK yorambac@google.com Guy Lever Deep Mind London, UK guylever@google.com Angeliki Lazaridou Deep Mind London, UK angeliki@google.com Thore Graepel Deep Mind London, UK thore@google.com
Pseudocode Yes Algorithm 1 Calculation of positive signalling loss
Open Source Code No The paper does not include any explicit statements or links indicating that the source code for the described methodology is publicly available.
Open Datasets Yes 4.1 Summing MNIST digits In this task, depicted in Figure 1, the speaker and listener agents each observe a different MNIST digit (as an image), and must determine the sum of the digits.
Dataset Splits No The paper mentions training agents and uses "batch of rollouts" but does not explicitly provide specific train/validation/test dataset splits (e.g., percentages, sample counts, or predefined split references).
Hardware Specification No The paper does not explicitly describe the specific hardware used (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) to run its experiments.
Software Dependencies No The paper mentions algorithms and methods like REINFORCE, Advantage Actor-Critic, V-trace, and RMSProp, but it does not provide specific version numbers for any software libraries or dependencies (e.g., Python, TensorFlow, PyTorch versions).
Experiment Setup No The full details of the Treasure Hunt environment, together with the hyperparameters used in our agents, can be found in the supplementary material. This indicates the details are not in the main text.