Linking Emergent and Natural Languages via Corpus Transfer
Authors: Shunyu Yao, Mo Yu, Yang Zhang, Karthik R Narasimhan, Joshua B. Tenenbaum, Chuang Gan
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through a series of experiments, we find that corpus transfer is helpful when the downstream natural language resource is limited. For example, in a low-resource setup of modeling two million natural language tokens, such a transfer scheme reduces the test perplexity by 24.6% on average versus training from scratch, across ten different downstream languages. |
| Researcher Affiliation | Collaboration | Shunyu Yao Princeton University Mo Yu Wechat AI Yang Zhang MIT-IBM Watson AI Lab Karthik Narasimhan Princeton University Joshua B. Tenenbaum MIT Chuang Gan MIT-IBM Watson AI Lab |
| Pseudocode | No | The paper describes methods and processes through text and equations (e.g., in Section 3.1) but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code at https://github.com/ysymyth/ec-nl and correspondence to shunyuy@princeton.edu. |
| Open Datasets | Yes | We scrape Wikipedia corpora of 10 languages to test downstream transfer... Implementation We train the EC game and generate ec based on the Conceptual Captions dataset (Sharma et al., 2018), using more than 2.8 million natural images in the wild... Fine-tuning Data We use the MS-COCO dataset (Lin et al., 2014) for fine-tuning... For downstream language modeling, we use Image Net (Deng et al., 2009) to generate a corpus of 15 million tokens and fine-tune on Romanian (ro) and Hebrew (he). |
| Dataset Splits | Yes | We report the test perplexity at the best validation loss. ... We use the full training set, or a subset with 5,000 or 50,000 samples to study the transfer benefit when natural language annotation is limited. |
| Hardware Specification | Yes | Each game training only takes less than 12 hours using one Ge Force RTX 2080 GPU. ... An pre-training experiment can finish within one hour using one Ge Force RTX 3090 GPU, while a fine-tuning or training-from-scratch experiment can finish within one hour using one Ge Force RTX 2080 GPU. ... Pre-training on Conceptual Captions takes 8 Ge Force RTX 3090 GPU for around two days. |
| Software Dependencies | No | The paper mentions using 'Huggingface's transformers' (Wolf et al., 2019) and 'FAIRSEQ' (Ott et al., 2019) with references, but does not specify the exact version numbers of these or other software dependencies used for their own implementation. |
| Experiment Setup | Yes | Other architecture and training details mainly follow Li et al. (2020b), and by default V = 4035, T = 15, K = 256. For language modeling, we adopt a Transformer (Vaswani et al., 2017) with 6 decoder layers and 6 attention heads, and pre-train on each source corpus for 3,000 steps with batch size 32, input length 1,000, and learning rate 5 10 4. For fine-tuning and training from scratch on downstream corpora, the batch size is 8 and learning rate is 10 4. |