Future-Guided Incremental Transformer for Simultaneous Translation

Authors: Shaolei Zhang, Yang Feng, Liangyou Li14428-14436

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conducted experiments on Chinese-English and German English simultaneous translation tasks and compared with the wait-k policy to evaluate the proposed method.
Researcher Affiliation Collaboration 1Key Laboratory of Intelligent Information Processing Institute of Computing Technology, Chinese Academy of Sciences (ICT/CAS) 2 University of Chinese Academy of Sciences, Beijing, China 3 Huawei Noah s Ark Lab
Pseudocode No The paper includes mathematical equations and architectural diagrams but no structured pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any explicit statement about providing open-source code or a link to a code repository.
Open Datasets Yes Chinese English (Zh-En) The training set consists of about 1.25M sentence pairs from LDC corpora1. We use MT02 as the validation set and MT03, MT04, MT05, MT06, MT08 as the test sets, each with 4 English references. We first tokenize and lowercase English sentences with the Moses2, and segmente the Chinese sentences with the Stanford Segmentor3. We apply BPE (Sennrich, Haddow, and Birch 2016) with 30K merge operations on all texts. German English (De-En) The training set consists of about 4.5M sentence pairs from WMT15 4 De-En task.
Dataset Splits Yes Chinese English (Zh-En) ... We use MT02 as the validation set and MT03, MT04, MT05, MT06, MT08 as the test sets... German English (De-En) ... We use news-test2013(3000 sentence pairs) as the validation set and news-test2015(2169 sentence pairs) as the test set.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper mentions 'Moses', 'Stanford Segmentor', and 'Fairseq Library' but does not provide specific version numbers for these software components.
Experiment Setup Yes The parameters of the incremental Transformer we proposed are exactly the same as the standard wait-k (Ma et al. 2019), while the conventional Transformer is the same as the original Transformer (Vaswani et al. 2017). ... where λ is an hyper-parameter controlling the importance of the penalty term, we set λ = 0.1 in our experiments.