Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Biases for Emergent Communication in Multi-agent Reinforcement Learning

Authors: Tom Eccles, Yoram Bachrach, Guy Lever, Angeliki Lazaridou, Thore Graepel

NeurIPS 2019 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 Empirical Analysis We consider two environments. The first is a simple one-step environment, where agents must sum MNIST digits by communicating their value. ... The second environment is a new multi-step MARL environment which we name Treasure Hunt.
Researcher Affiliation Industry Tom Eccles Deep Mind London, UK EMAIL Yoram Bachrach Deep Mind London, UK EMAIL Guy Lever Deep Mind London, UK EMAIL Angeliki Lazaridou Deep Mind London, UK EMAIL Thore Graepel Deep Mind London, UK EMAIL
Pseudocode Yes Algorithm 1 Calculation of positive signalling loss
Open Source Code No The paper does not include any explicit statements or links indicating that the source code for the described methodology is publicly available.
Open Datasets Yes 4.1 Summing MNIST digits In this task, depicted in Figure 1, the speaker and listener agents each observe a different MNIST digit (as an image), and must determine the sum of the digits.
Dataset Splits No The paper mentions training agents and uses "batch of rollouts" but does not explicitly provide specific train/validation/test dataset splits (e.g., percentages, sample counts, or predefined split references).
Hardware Specification No The paper does not explicitly describe the specific hardware used (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) to run its experiments.
Software Dependencies No The paper mentions algorithms and methods like REINFORCE, Advantage Actor-Critic, V-trace, and RMSProp, but it does not provide specific version numbers for any software libraries or dependencies (e.g., Python, TensorFlow, PyTorch versions).
Experiment Setup No The full details of the Treasure Hunt environment, together with the hyperparameters used in our agents, can be found in the supplementary material. This indicates the details are not in the main text.