Emergent Translation in Multi-Agent Communication
Authors: Jason Lee, Kyunghyun Cho, Jason Weston, Douwe Kiela
ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We compare our model against a number of baselines, including a nearest neighbor method and a recently proposed model (Nakayama & Nishida, 2017) that maps languages and images to a shared space, but lacks communication. We evaluate performance on both wordand sentence-level translation, and show that our model outperforms the baselines in both settings. Additionally, we show |
| Researcher Affiliation | Collaboration | Jason Lee New York University jason@cs.nyu.edu Kyunghyun Cho New York University Facebook AI Research kyunghyun.cho@nyu.edu Jason Weston Facebook AI Research jase@fb.com Douwe Kiela Facebook AI Research dkiela@fb.com |
| Pseudocode | No | No pseudocode or clearly labeled algorithm blocks are present in the paper. The model architecture and training process are described in prose and mathematical equations. |
| Open Source Code | No | No explicit statement or link providing open-source code for the methodology described in this paper was found. |
| Open Datasets | Yes | We use the Bergsma500 dataset (Bergsma & Van Durme, 2011)... The Multi30k (Elliott et al., 2016) dataset... We use MS COCO (Lin et al., 2014; Chen et al., 2015), which contains 120k images and 5 English captions per image, and STAIR (Yoshikawa et al., 2017), a collection of Japanese annotations of the same dataset (also 5 per image). |
| Dataset Splits | Yes | We train on 80% of the images, and choose the model with the best communication accuracy on the 20% validation set when reporting translation performance. ... We use the original data split: 29k training, 1k validation and 1k test images. ... Following Karpathy & Li (2015), we use 110k training, 5k validation and 5k test images. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper mentions several software components like "Adam optimizer", "pre-trained Res Net with 50 layers", "Moses", "byte pair encoding (BPE) algorithm", "Gumbel-softmax", and "REINFORCE", but it does not specify any version numbers for these, which is required for reproducibility. |
| Experiment Setup | Yes | We train with 1 distractor (K = 2)2, learning rate 3e 4, and minibatch size 128. The embedding and hidden state dimensionalities are set to 400. ... We train with 1 distractor (K = 2)5 and minibatch size 64. The hidden state size and embedding dimensionalities are 1024 and 512, respectively. The learning rate and dropout rate are tuned on the validation set for each task. |