Emergent Communication through Negotiation

Authors: Kris Cao, Angeliki Lazaridou, Marc Lanctot, Joel Z Leibo, Karl Tuyls, Stephen Clark

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our first experiment, we test whether purely self-interested agents can learn to negotiate and divide up items fairly, and investigate the effect of the various communication channels on negotiation success. We train self-interested agents to negotiate for 500k episodes. Each episode corresponds to a batch of 128 games, each with item pools and hidden utilities generated as described in Section 2.1.
Researcher Affiliation Collaboration Kris Cao Department of Computer Science and Technology, University of Cambridge, UK Angeliki Lazaridou, Marc Lanctot, Joel Z Leibo, Karl Tuyls, Stephen Clark Deep Mind, London, UK
Pseudocode No The paper describes the agent architecture and learning process in detailed paragraphs, but it does not include any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide an explicit statement about releasing its source code, nor does it include a link to a code repository for the methodology described.
Open Datasets No The paper states that 'At each round (i) an item pool is sampled uniformly, instantiating a quantity (between 0 and 5) for each of the types and represented as a vector i {0...5}3 and (ii) each agent j receives a utility function sampled uniformly, which specifies how rewarding one unit of each item is (with item rewards between 0 and 10, and with the constraint that there is at least one item with non-zero utility), represented as a vector uj {0...10}3.' This indicates that data is procedurally generated rather than sourced from a publicly available dataset with concrete access information.
Dataset Splits No The paper states 'We also hold out 5 batches worth of games as a test set, and test the agents every 50 episodes', indicating a train-test split, but no explicit validation set or split percentages are provided.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running its experiments.
Software Dependencies No The paper mentions using 'the ADAM optimizer (Kingma & Ba, 2014)', but it does not provide specific version numbers for software components or libraries.
Experiment Setup Yes Embedding sizes, and all neural network hidden states, had dimension 100. We used the ADAM optimizer (Kingma & Ba, 2014), with default parameter settings, to optimize the parameters of each agent. [...] We used a separate value of λ, the entropy regularisation weight hyperparameter, for each policy. For πterm and πprop, λ = 0.05; for πutt, λ = 0.001. The symbol vocabulary size was 11, and the agents were allowed to generate utterances of up to length 6.