Empirical Regularization for Synthetic Sentence Pairs in Unsupervised Neural Machine Translation

Authors: Xi Ai, Bin Fang12471-12479

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our comprehensive experiments support that our method can generally improve the performance of currently successful models on three similar pairs {French, German, Romanian} English and one dissimilar pair Russian English with acceptably additional cost.
Researcher Affiliation Academia Xi Ai, Bin Fang College of Computer Science,Chongqing University barid.x.ai@gmail.com, fb@cqu.edu.cn
Pseudocode Yes Algorithm 1 Local Alignment
Open Source Code No We implement our experiments on Tensorflow 2.0 (Abadi et al. 2016) and will open our source code on Git Hub.
Open Datasets Yes Specifically, we first retrieve monolingual corpora {French, German, English, Russian} from WMT 2018 4 (Bojar et al. 2018) including all available News Crawl datasets from 2007 through 2017 and monolingual corpora Romanian from WMT 2016 5 (Bojar et al. 2016) including News Crawl 2016.
Dataset Splits No The paper mentions specific test sets ('newstest2014', 'newstest2016') but does not provide explicit details about the train/validation splits (e.g., percentages, sample counts, or references to predefined splits) used from the WMT corpora for reproducing the experiments.
Hardware Specification No The paper does not provide specific details about the hardware used for running experiments, such as GPU or CPU models, memory specifications, or cloud computing instance types. It only generally refers to 'our machine' without specifics.
Software Dependencies Yes We implement our experiments on Tensorflow 2.0 (Abadi et al. 2016)
Experiment Setup Yes Adam optimizer (Kingma and Ba 2015) is used with parameters β1 = 0.9, β2 = 0.98, ϵ = 10 9 and a dynamic learning rate over the course of training (Vaswani et al. 2017) (warmup steps = 5000). We set dropout regularization with a drop rate rate = 0.1 and label smoothing with gamma = 0.1 (Mezzini 2018).