Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Emergent Communication through Negotiation
Authors: Kris Cao, Angeliki Lazaridou, Marc Lanctot, Joel Z Leibo, Karl Tuyls, Stephen Clark
ICLR 2018 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our first experiment, we test whether purely self-interested agents can learn to negotiate and divide up items fairly, and investigate the effect of the various communication channels on negotiation success. We train self-interested agents to negotiate for 500k episodes. Each episode corresponds to a batch of 128 games, each with item pools and hidden utilities generated as described in Section 2.1. |
| Researcher Affiliation | Collaboration | Kris Cao Department of Computer Science and Technology, University of Cambridge, UK Angeliki Lazaridou, Marc Lanctot, Joel Z Leibo, Karl Tuyls, Stephen Clark Deep Mind, London, UK |
| Pseudocode | No | The paper describes the agent architecture and learning process in detailed paragraphs, but it does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing its source code, nor does it include a link to a code repository for the methodology described. |
| Open Datasets | No | The paper states that 'At each round (i) an item pool is sampled uniformly, instantiating a quantity (between 0 and 5) for each of the types and represented as a vector i {0...5}3 and (ii) each agent j receives a utility function sampled uniformly, which specifies how rewarding one unit of each item is (with item rewards between 0 and 10, and with the constraint that there is at least one item with non-zero utility), represented as a vector uj {0...10}3.' This indicates that data is procedurally generated rather than sourced from a publicly available dataset with concrete access information. |
| Dataset Splits | No | The paper states 'We also hold out 5 batches worth of games as a test set, and test the agents every 50 episodes', indicating a train-test split, but no explicit validation set or split percentages are provided. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running its experiments. |
| Software Dependencies | No | The paper mentions using 'the ADAM optimizer (Kingma & Ba, 2014)', but it does not provide specific version numbers for software components or libraries. |
| Experiment Setup | Yes | Embedding sizes, and all neural network hidden states, had dimension 100. We used the ADAM optimizer (Kingma & Ba, 2014), with default parameter settings, to optimize the parameters of each agent. [...] We used a separate value of λ, the entropy regularisation weight hyperparameter, for each policy. For πterm and πprop, λ = 0.05; for πutt, λ = 0.001. The symbol vocabulary size was 11, and the agents were allowed to generate utterances of up to length 6. |