Evaluating the Cross-Lingual Effectiveness of Massively Multilingual Neural Machine Translation
Authors: Aditya Siddhant, Melvin Johnson, Henry Tsai, Naveen Ari, Jason Riesa, Ankur Bapna, Orhan Firat, Karthik Raman8854-8861
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we evaluate the cross-lingual effectiveness of representations from the encoder of a massively multilingual NMT model on 5 downstream classification and sequence labeling tasks covering a diverse set of over 50 languages. We compare against a strong baseline, multilingual BERT (m BERT) (Devlin et al. 2018), in different cross-lingual transfer learning scenarios and show gains in zero-shot transfer in 4 out of these 5 tasks. |
| Researcher Affiliation | Industry | Aditya Siddhant, Melvin Johnson, Henry Tsai, Naveen Ari, Jason Riesa, Ankur Bapna, Orhan Firat, Karthik Raman Google Research {adisid, melvinp, henrytsai, navari, reisa, ankurbpn, orhanf, karthikraman}@google.com |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper mentions open-source implementations and provides a footnote link to the Sentencepiece GitHub page (https://github.com/google/sentencepiece), but it does not state that the authors are releasing the code for their own Massively Multilingual Translation Encoder (MMTE) or their experimental setup. |
| Open Datasets | Yes | We train our multilingual NMT system on a massive scale, using an in-house corpus generated by crawling and extracting parallel sentences from the web (Uszkoreit et al. 2010). This corpus contains parallel documents for 102 languages, to and from English, comprising a total of 25 billion sentence pairs. |
| Dataset Splits | Yes | The in-language setting has training, development and test sets from the language. In the zero-shot setting, the train and dev sets contain only English examples but we test on all the languages. |
| Hardware Specification | No | The paper describes the model architecture and parameters ('Transformer Big containing 375M parameters'), but does not specify the hardware (e.g., CPU, GPU models, memory) used for training or inference. |
| Software Dependencies | No | The paper mentions the use of 'Transformer architecture (Vaswani et al. 2017) in the open-source implementation under the Lingvo framework (Shen et al. 2019)' and 'sentence-piece model (SPM)1 (Kudo and Richardson 2018)'. While these tools are named, specific version numbers for Lingvo or Sentencepiece are not provided in the text or footnotes. |
| Experiment Setup | Yes | We use a larger version of Transformer Big containing 375M parameters (6 layers, 16 heads, 8192 hidden dimension) (Chen et al. 2018), and a shared source-target sentence-piece model (SPM)1 (Kudo and Richardson 2018) vocabulary with 64k individual tokens. All our models are trained with Adafactor (Shazeer and Stern 2018) with momentum factorization, a learning rate schedule of (3.0, 40k)2 and a per-parameter norm clipping threshold of 1.0. |