Meta Back-Translation
Authors: Hieu Pham, Xinyi Wang, Yiming Yang, Graham Neubig
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our evaluations in both the standard datasets WMT En De 14 and WMT En-Fr 14, as well as a multilingual translation setting, our method leads to significant improvements over strong baselines. |
| Researcher Affiliation | Academia | Anonymous authors Paper under double-blind review |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. Figure 1 is an illustrative example, not pseudocode. |
| Open Source Code | No | The paper states it uses existing architectures and frameworks ('Transformer-Base architecture (Vaswani et al., 2017)' and 'fairseq (Ott et al., 2019)') but does not provide a link or explicit statement for its own source code for Meta BT. |
| Open Datasets | Yes | For the standard setting, we consider two large datasets: WMT En-De 2014 and WMT En-Fr 20141, tokenized with Sentence Piece (Kudo & Richardson, 2018) using a joint vocabulary size of 32K for each dataset. ... The multilingual setting uses the multilingual TED talk dataset (Qi et al., 2018). |
| Dataset Splits | Yes | For the standard setting, we consider two large datasets: WMT En-De 2014 and WMT En-Fr 20141 [footnote to http://www.statmt.org/wmt14/]. ... we also have a separate validation set for hyper-parameter tuning and model selection. |
| Hardware Specification | Yes | All experiments are conducted on 8 NVIDIA V100 GPUs. |
| Software Dependencies | No | The paper mentions 'fairseq (Ott et al., 2019)' and 'Adam (Kingma & Ba, 2015)' but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | Optimizer: Adam (Kingma & Ba, 2015) with β1 = 0.9 and β2 = 0.98. The initial learning rate is 5e-4 and is warmed up for 4000 steps, then decayed using inverse square root. Label smoothing: 0.1. Dropout: 0.3. Min-max batching for parallel data, with 4096 tokens per batch. For monolingual data, the batch size is 64 sentences for WMT En-De, 16 for WMT En-Fr, and 8 for the multilingual data. |