Frequency-Aware Contrastive Learning for Neural Machine Translation

Authors: Tong Zhang, Wei Ye, Baosong Yang, Long Zhang, Xingzhang Ren, Dayiheng Liu, Jinan Sun, Shikun Zhang, Haibo Zhang, Wen Zhao11712-11720

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments on widely used NIST Chinese-English and WMT14 English German translation tasks. Empirical results show that our proposed methods can not only significantly improve the translation quality but also enhance lexical diversity and optimize word representation space.
Researcher Affiliation Collaboration Tong Zhang1, Wei Ye1,*, Baosong Yang2, Long Zhang1, Xingzhang Ren2, Dayiheng Liu2, Jinan Sun1,*, Shikun Zhang1, Haibo Zhang2, Wen Zhao1 1 National Engineering Research Center for Software Engineering, Peking University 2 Alibaba Group
Pseudocode No The paper does not contain pseudocode or clearly labeled algorithm blocks.
Open Source Code No The paper states that their models are implemented on THUMT toolkit, but does not provide a link or explicit statement about releasing their own source code for the proposed methods (FCL/TCL).
Open Datasets Yes For Zh-En translation, we use the LDC4 corpus as the training set, which consists of 1.25M sentence pairs. ... For En-De Translation, the training data contains 4.5M sentence pairs collected from WMT 2014 En-De dataset.
Dataset Splits Yes We adopt NIST 2006 (MT06) as the validation set and NIST 2002, 2003, 2004, 2005, 2008 datasets as the test sets. For En-De Translation, ... We adapt newstest2013 as the validation set and test our model on newstest2014.
Hardware Specification No We use 1 GPU for the NIST Zh-En task and 4 GPUs for WMT14 En-De task. This mentions the quantity of GPUs but not their specific model (e.g., NVIDIA V100, RTX 3090) or any other detailed hardware specifications.
Software Dependencies No All the baseline systems and our models are implemented on top of THUMT toolkit (Zhang et al. 2017). ... We adopt Moses tokenizer to deal with English and German sentences, and segment the Chinese sentences with the Stanford Segmentor. ... We employ the Adam optimizer with β2 = 0.998. ... We use multi bleu.perl to calculate the case-sensitive BLEU score. While various software components are mentioned, specific version numbers for these components (e.g., THUMT version, Moses tokenizer version) are not provided.
Experiment Setup Yes During training, the dropout rate and label smoothing are set to 0.1. We employ the Adam optimizer with β2 = 0.998. ... The batch size is 4096 for each GPU. ... For TCL and FCL, the optimal λ for contrastive learning loss is 2.0. The scale factor γ in FCL is set to be 1.4.