Data Diversification: A Simple Strategy For Neural Machine Translation

Authors: Xuan-Phi Nguyen, Shafiq Joty, Kui Wu, Ai Ti Aw

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we present experiments to demonstrate that our data diversification approach improves translation quality in many translation tasks, encompassing WMT and IWSLT tasks, and high- and low-resource translation tasks.
Researcher Affiliation Collaboration Xuan-Phi Nguyen1,3, Shafiq Joty1,2, Wu Kui3, Ai Ti Aw3 1Nanyang Technological University 2Salesforce Research 3Institute for Infocomm Research (I2R), A*STAR Singapore
Pseudocode Yes Algorithm 1 Data Diversification: Given a dataset D = (S, T), a diversification factor k, the number of rounds N; return a trained source-target translation model ˆ MS T .
Open Source Code Yes 1Code: https://github.com/nxphi47/data_diversification
Open Datasets Yes We conduct experiments on the standard WMT 14 English-German (En-De) and English French (En-Fr) translation tasks. ... We evaluate our approach in IWSLT 14 English-German (En-De) and German-English (De-En), IWSLT 13 English-French (En-Fr) and French-English (Fr-En) translation tasks. ... We use the English-Nepali and English-Sinhala low-resource setup proposed by Guzmán et al. [10].
Dataset Splits Yes We use newstest2013 as the development set... We randomly sample 5% of the training data for validation... We use the IWSLT15.TED.tst2012 set for validation... use their dev set for development
Hardware Specification No The paper mentions scaling the training process to 128 GPUs, but it does not specify the model or type of GPUs, CPUs, or other hardware used for the experiments. It does not provide any specific hardware specifications.
Software Dependencies No The paper mentions using the Transformer [24] as the base architecture and references other tools and models (e.g., BERT), but it does not provide specific version numbers for any software dependencies like PyTorch, TensorFlow, or CUDA.
Experiment Setup Yes We use the Transformer [24] as our NMT model and follow the same configurations as suggested by Ott et al. [15]. When augmenting the datasets, we filter out the duplicate pairs... The data generation process costs approximately 30% the time to train the baseline. We average the last 5 checkpoints... unless specified otherwise, we use the default setup of k = 3 and N = 1.