Emergent Communication through Negotiation
Authors: Kris Cao, Angeliki Lazaridou, Marc Lanctot, Joel Z Leibo, Karl Tuyls, Stephen Clark
ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our first experiment, we test whether purely self-interested agents can learn to negotiate and divide up items fairly, and investigate the effect of the various communication channels on negotiation success. We train self-interested agents to negotiate for 500k episodes. Each episode corresponds to a batch of 128 games, each with item pools and hidden utilities generated as described in Section 2.1. |
| Researcher Affiliation | Collaboration | Kris Cao Department of Computer Science and Technology, University of Cambridge, UK Angeliki Lazaridou, Marc Lanctot, Joel Z Leibo, Karl Tuyls, Stephen Clark Deep Mind, London, UK |
| Pseudocode | No | The paper describes the agent architecture and learning process in detailed paragraphs, but it does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing its source code, nor does it include a link to a code repository for the methodology described. |
| Open Datasets | No | The paper states that 'At each round (i) an item pool is sampled uniformly, instantiating a quantity (between 0 and 5) for each of the types and represented as a vector i {0...5}3 and (ii) each agent j receives a utility function sampled uniformly, which specifies how rewarding one unit of each item is (with item rewards between 0 and 10, and with the constraint that there is at least one item with non-zero utility), represented as a vector uj {0...10}3.' This indicates that data is procedurally generated rather than sourced from a publicly available dataset with concrete access information. |
| Dataset Splits | No | The paper states 'We also hold out 5 batches worth of games as a test set, and test the agents every 50 episodes', indicating a train-test split, but no explicit validation set or split percentages are provided. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running its experiments. |
| Software Dependencies | No | The paper mentions using 'the ADAM optimizer (Kingma & Ba, 2014)', but it does not provide specific version numbers for software components or libraries. |
| Experiment Setup | Yes | Embedding sizes, and all neural network hidden states, had dimension 100. We used the ADAM optimizer (Kingma & Ba, 2014), with default parameter settings, to optimize the parameters of each agent. [...] We used a separate value of λ, the entropy regularisation weight hyperparameter, for each policy. For πterm and πprop, λ = 0.05; for πutt, λ = 0.001. The symbol vocabulary size was 11, and the agents were allowed to generate utterances of up to length 6. |