Data Diversification: A Simple Strategy For Neural Machine Translation
Authors: Xuan-Phi Nguyen, Shafiq Joty, Kui Wu, Ai Ti Aw
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we present experiments to demonstrate that our data diversification approach improves translation quality in many translation tasks, encompassing WMT and IWSLT tasks, and high- and low-resource translation tasks. |
| Researcher Affiliation | Collaboration | Xuan-Phi Nguyen1,3, Shafiq Joty1,2, Wu Kui3, Ai Ti Aw3 1Nanyang Technological University 2Salesforce Research 3Institute for Infocomm Research (I2R), A*STAR Singapore |
| Pseudocode | Yes | Algorithm 1 Data Diversification: Given a dataset D = (S, T), a diversification factor k, the number of rounds N; return a trained source-target translation model ˆ MS T . |
| Open Source Code | Yes | 1Code: https://github.com/nxphi47/data_diversification |
| Open Datasets | Yes | We conduct experiments on the standard WMT 14 English-German (En-De) and English French (En-Fr) translation tasks. ... We evaluate our approach in IWSLT 14 English-German (En-De) and German-English (De-En), IWSLT 13 English-French (En-Fr) and French-English (Fr-En) translation tasks. ... We use the English-Nepali and English-Sinhala low-resource setup proposed by Guzmán et al. [10]. |
| Dataset Splits | Yes | We use newstest2013 as the development set... We randomly sample 5% of the training data for validation... We use the IWSLT15.TED.tst2012 set for validation... use their dev set for development |
| Hardware Specification | No | The paper mentions scaling the training process to 128 GPUs, but it does not specify the model or type of GPUs, CPUs, or other hardware used for the experiments. It does not provide any specific hardware specifications. |
| Software Dependencies | No | The paper mentions using the Transformer [24] as the base architecture and references other tools and models (e.g., BERT), but it does not provide specific version numbers for any software dependencies like PyTorch, TensorFlow, or CUDA. |
| Experiment Setup | Yes | We use the Transformer [24] as our NMT model and follow the same configurations as suggested by Ott et al. [15]. When augmenting the datasets, we filter out the duplicate pairs... The data generation process costs approximately 30% the time to train the baseline. We average the last 5 checkpoints... unless specified otherwise, we use the default setup of k = 3 and N = 1. |