Translating with Bilingual Topic Knowledge for Neural Machine Translation

Authors: Xiangpeng Wei, Yue Hu, Luxi Xing, Yipeng Wang, Li Gao7257-7264

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that the proposed model consistently outperforms the traditional RNNsearch and the previous topic-informed NMT on Chinese-English and English German translation tasks.
Researcher Affiliation Collaboration 1Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China 2School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China 3Platform & Content Group, Tencent, Beijing, China
Pseudocode No The paper describes the model architecture and mathematical formulations but does not include structured pseudocode or algorithm blocks.
Open Source Code No The paper refers to an open-source NMT toolkit (DL4MT) used for comparison, but it does not provide concrete access to the source code for the proposed BLT-NMT methodology.
Open Datasets Yes We have built two document-aligned corpora from Wikipedia comparable corpora1 for Chinese-English and English-German language pairs, respectively. ... The parallel training data consists of 1.25M sentence pairs extracted from LDC corpora2... we use the same subset of WMT 2014 training corpus... Footnote 1: "http://linguatools.org/tools/corpora/wikipedia-comparable-corpora/" Footnote 2: "LDC2002E18, LDC2003E07, LDC2003E14, the Hansards portion of LDC2004T07, LDC2004T08, and LDC2005T06"
Dataset Splits Yes We choose NIST 2002 (NIST02) as development set... For English-German: The concatenation of newstest2012 and newstest2013 is used as the development set...
Hardware Specification Yes We train our BLT-NMT on single NVIDIA Titan X GPU.
Software Dependencies No The paper mentions using scripts (mteval-v11b.pl, multibelu.pl) and the Stanford Chinese word segmenter (Tseng et al. 2005) but does not provide specific version numbers for software dependencies or the overall software environment.
Experiment Setup Yes For all experiments, we set the following hyper-parameters: word embedding dimension as 512, hidden layer size as 1024, batch size as 80, gradient norm as 5.0, dropout rate as 0.3 and beam width as 10. ... We use the Adam optimizer with β1 = 0.9, β2 = 0.98, and ϵ = 10 9 and follow the same learning rate schedule in (Vaswani et al. 2017).