Towards Discourse-Aware Document-Level Neural Machine Translation

Authors: Xin Tan, Longyin Zhang, Fang Kong, Guodong Zhou

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments on the document-level English-German and English Chinese translation tasks with three domains (TED, News, and Europarl). Experimental results show that our Disco2NMT model significantly surpasses both context-agnostic and context-aware baseline systems on multiple evaluation indicators.
Researcher Affiliation Academia Xin Tan , Longyin Zhang , Fang Kong and Guodong Zhou School of Computer Science and Technology, Soochow University, China {xtan9, lyzhang9}@stu.suda.edu.cn, {kongfang, gdzhou}@suda.edu.cn
Pseudocode No The paper describes methods using natural language and mathematical equations, but does not include any structured pseudocode or algorithm blocks.
Open Source Code No The paper provides links to datasets and third-party tools, but there is no explicit statement or link indicating that the source code for the Disco2NMT model itself is publicly available.
Open Datasets Yes Datasets. We conduct several experiments on the English German and English-Chinese translation tasks with corpora from the following three domains: TED (En-De/En Zh): For English-German, we use TED talks from IWSLT2017 [Cettolo et al., 2012] evaluation campaigns2 as the training corpus... News (En-De/En-Zh): For English-German, we take News Commentary V11 as our training corpus... Europarl (En De): The corpus are extracted from the Europarl V7 according to Maruf and Haffari [2018].
Dataset Splits Yes For English-German, we use TED talks from IWSLT2017 [Cettolo et al., 2012] evaluation campaigns as the training corpus, tst2016-2017 as the test corpus, and the rest set as the development corpus.
Hardware Specification No The paper does not provide any specific details about the hardware used for running the experiments (e.g., GPU models, CPU types, memory).
Software Dependencies No The paper mentions software like 'Moses Toolkit', 'subword-nmt', 'Transformer', 'Adam optimizer', and 'Fasttext', but it does not specify version numbers for any of these components, which is required for reproducibility.
Experiment Setup Yes Specifically, the hidden size and filter size were set to 512 and 2048, respectively. Both encoder and decoder were composed of 6 hidden layers. The source and target vocabulary size were set to 30K. The beam size and dropout rate were set to 5 and 0.1, respectively. We used the Adam optimizer to train our model.