reproducibilityindex.ai

Data Diversification: A Simple Strategy For Neural Machine Translation

Authors: Xuan-Phi Nguyen, Shafiq Joty, Kui Wu, Ai Ti Aw

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we present experiments to demonstrate that our data diversiﬁcation approach improves translation quality in many translation tasks, encompassing WMT and IWSLT tasks, and high- and low-resource translation tasks.
Researcher Affiliation	Collaboration	Xuan-Phi Nguyen1,3, Shaﬁq Joty1,2, Wu Kui3, Ai Ti Aw3 1Nanyang Technological University 2Salesforce Research 3Institute for Infocomm Research (I2R), A*STAR Singapore
Pseudocode	Yes	Algorithm 1 Data Diversiﬁcation: Given a dataset D = (S, T), a diversiﬁcation factor k, the number of rounds N; return a trained source-target translation model ˆ MS T .
Open Source Code	Yes	1Code: https://github.com/nxphi47/data_diversiﬁcation
Open Datasets	Yes	We conduct experiments on the standard WMT 14 English-German (En-De) and English French (En-Fr) translation tasks. ... We evaluate our approach in IWSLT 14 English-German (En-De) and German-English (De-En), IWSLT 13 English-French (En-Fr) and French-English (Fr-En) translation tasks. ... We use the English-Nepali and English-Sinhala low-resource setup proposed by Guzmán et al. [10].
Dataset Splits	Yes	We use newstest2013 as the development set... We randomly sample 5% of the training data for validation... We use the IWSLT15.TED.tst2012 set for validation... use their dev set for development
Hardware Specification	No	The paper mentions scaling the training process to 128 GPUs, but it does not specify the model or type of GPUs, CPUs, or other hardware used for the experiments. It does not provide any specific hardware specifications.
Software Dependencies	No	The paper mentions using the Transformer [24] as the base architecture and references other tools and models (e.g., BERT), but it does not provide specific version numbers for any software dependencies like PyTorch, TensorFlow, or CUDA.
Experiment Setup	Yes	We use the Transformer [24] as our NMT model and follow the same conﬁgurations as suggested by Ott et al. [15]. When augmenting the datasets, we ﬁlter out the duplicate pairs... The data generation process costs approximately 30% the time to train the baseline. We average the last 5 checkpoints... unless speciﬁed otherwise, we use the default setup of k = 3 and N = 1.