Meta Back-Translation

Authors: Hieu Pham, Xinyi Wang, Yiming Yang, Graham Neubig

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our evaluations in both the standard datasets WMT En De 14 and WMT En-Fr 14, as well as a multilingual translation setting, our method leads to significant improvements over strong baselines.
Researcher Affiliation Academia Anonymous authors Paper under double-blind review
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks. Figure 1 is an illustrative example, not pseudocode.
Open Source Code No The paper states it uses existing architectures and frameworks ('Transformer-Base architecture (Vaswani et al., 2017)' and 'fairseq (Ott et al., 2019)') but does not provide a link or explicit statement for its own source code for Meta BT.
Open Datasets Yes For the standard setting, we consider two large datasets: WMT En-De 2014 and WMT En-Fr 20141, tokenized with Sentence Piece (Kudo & Richardson, 2018) using a joint vocabulary size of 32K for each dataset. ... The multilingual setting uses the multilingual TED talk dataset (Qi et al., 2018).
Dataset Splits Yes For the standard setting, we consider two large datasets: WMT En-De 2014 and WMT En-Fr 20141 [footnote to http://www.statmt.org/wmt14/]. ... we also have a separate validation set for hyper-parameter tuning and model selection.
Hardware Specification Yes All experiments are conducted on 8 NVIDIA V100 GPUs.
Software Dependencies No The paper mentions 'fairseq (Ott et al., 2019)' and 'Adam (Kingma & Ba, 2015)' but does not provide specific version numbers for these software components.
Experiment Setup Yes Optimizer: Adam (Kingma & Ba, 2015) with β1 = 0.9 and β2 = 0.98. The initial learning rate is 5e-4 and is warmed up for 4000 steps, then decayed using inverse square root. Label smoothing: 0.1. Dropout: 0.3. Min-max batching for parallel data, with 4096 tokens per batch. For monolingual data, the batch size is 64 sentences for WMT En-De, 16 for WMT En-Fr, and 8 for the multilingual data.