Compositional Obverter Communication Learning from Raw Visual Input
Authors: Edward Choi, Angeliki Lazaridou, Nando de Freitas
ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiment results suggest that the agents could develop, out of raw visual input, a language with compositional properties, given a proper pressure from the environment (i.e. the image description game). In this section, we first study the convergence behavior during the training phase. Then we analyze the language developed by the agents in terms of compositionality. |
| Researcher Affiliation | Collaboration | Edward Choi Georgia Institute of Technology Atlanta, GA, USA mp2893@gatech.edu Angeliki Lazaridou & Nando de Freitas Deep Mind London, UK {angeliki, nandodefreitas}@google.com |
| Pseudocode | Yes | Algorithm 1: Message generation process used in Batali (1998). Algorithm 2: Message generation process used in our work. |
| Open Source Code | No | The paper does not provide any explicit statements or links indicating that the source code for the described methodology is publicly available. |
| Open Datasets | No | We generated synthetic images using Mujoco physics simulator. The example images are shown in Figure 2. Each image depicts a single object with a specific color and shape in 128 128 resolution. There are eight colors (blue, red, white, gray, yellow, green, cyan, magenta) and five shapes (box, sphere, cylinder, capsule, ellipsoid), giving us 40 combinations. We generated 100 variations for each of the 40 object type. The paper describes how the dataset was generated but does not provide access information (link, citation, or repository) for the generated dataset itself. |
| Dataset Splits | No | The paper describes how mini-batches are constructed during training and a held-out set for zero-shot testing, but it does not specify a distinct validation dataset split with percentages or counts for hyperparameter tuning or early stopping. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments, only mentioning the software libraries used for implementation. |
| Software Dependencies | No | The paper mentions using 'Tensor Flow and the Sonnet library for all implementation' but does not specify their version numbers or any other software dependencies with versions. |
| Experiment Setup | Yes | We found twenty games per round, with fifty images per mini-batch to work well. We repeat the rounds for 20, 000 times. Further rounds did not improve the results, or even degraded the performance. For vocabulary size (i.e. number of unique symbols) and the maximum message length, we used 5 and 20 respectively, similar to what Batali (1998) used. Note that when generating a message using the obverter technique, the generation process stops as soon as the speaker s (i.e. teacher) output ˆy becomes bigger than some threshold. In our work, we experimented with various values from 0.5 to 0.95, and found higher values to work better than lower values. We used 0.95 for all our final experiments. |