Emergence of Linguistic Communication from Referential Games with Symbolic and Pixel Input
Authors: Angeliki Lazaridou, Karl Moritz Hermann, Karl Tuyls, Stephen Clark
ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Here we scale up this research by using contemporary deep learning methods and by training reinforcement-learning neural network agents on referential communication games. We extend previous work, in which agents were trained in symbolic environments, by developing agents which are able to learn from raw pixel data, a more challenging and realistic input representation. We find that the degree of structure found in the input data affects the nature of the emerged protocols, and thereby corroborate the hypothesis that structured compositional language is most likely to emerge when agents perceive the world as being structured. |
| Researcher Affiliation | Industry | Angeliki Lazaridou , Karl Moritz Hermann, Karl Tuyls, Stephen Clark Deep Mind, London, UK |
| Pseudocode | No | No pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | No | The paper does not provide any explicit statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | We use the Visual Attributes for Concepts Dataset (Vis A) of Silberer et al. (2013), which contains human-generated per-concept attribute annotations for 500 concrete concepts... Co-occurrence data is extracted from the MSCOCO caption dataset (Lin et al., 2014). We use a synthetic dataset of scenes consisting of geometric objects generated using the Mu Jo Co physics engine (Todorov et al., 2012). |
| Dataset Splits | Yes | For each game, we create train and test splits with proportions 75/25 (i.e., 3000/1000 for games A and B, and 1850/650 for games C and D). |
| Hardware Specification | No | The paper describes network architectures (e.g., 'single-layer LSTM', '8-layer convolutional neural network') but does not specify the physical hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions optimizers and activation functions but does not provide specific software dependencies with version numbers (e.g., programming language, framework, or library versions) used for the experiments. |
| Experiment Setup | Yes | All LSTM hidden states of the speaking and listening module as well and the seeing prelinguistic feed-forward encoders (see Section 3), have dimension 50. The seeing pre-linguistic Conv Net encoders (see Section 4) has 8 layers, 32 filters with the kernel size 3 for every layer and with strides [2, 1, 1, 2, 1, 2, 1, 2] for each layer. We use Re LU as activation function as well as batch normalization for every layer. For learning, we used the Rmsprop optimizer, with learning rate 0.0001. We use a separate value of entropy regularization for each policy. For πS we use 0.01 and for πL we use 0.001. We use a mini-batch of 32. |