Effective Graph Context Representation for Document-level Machine Translation

Authors: Kehai Chen, Muyun Yang, Masao Utiyama, Eiichiro Sumita, Rui Wang, Min Zhang

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on several widely-used document-level benchmarks demonstrated the effectiveness of the proposed approach.
Researcher Affiliation Academia 1Harbin Institute of Technology, Shenzhen, China 2Harbin Institute of Technology, Harbin, China 3National Institute of Information and Communications Technology, Kyoto, Japan 4Shanghai Jiao Tong University, Shanghai, China
Pseudocode No The paper describes the methods using mathematical formulas and text, but does not include any explicit pseudocode or algorithm blocks.
Open Source Code No The paper does not provide an explicit statement or link for the open-source code of the described methodology.
Open Datasets Yes For TED Talks in IWSLT17, we used dev-2010 as the development set, and test-2010/2011/2012/2013 as the test sets for both Chinese English (Zh-En) and English-German (En-De) language pairs. For News-Commentary v14 (News), we use the newstest2017 for development and newstest2018 to test both Zh-En and En De language pairs. Our approach was also evaluated on a large scale corpus Euro extracted from Europarl v7 [Maruf et al., 2019].
Dataset Splits Yes For TED Talks in IWSLT17, we used dev-2010 as the development set, and test-2010/2011/2012/2013 as the test sets for both Chinese English (Zh-En) and English-German (En-De) language pairs. ... Following the training of 100,000 batches, we used a single model obtained by averaging the last five checkpoints, which validated the model with an interval of 2,000 batches on the dev set.
Hardware Specification Yes We trained all models on eight V100 GPUs and evaluated them on a single V100 GPU.
Software Dependencies No The paper mentions using the 'fairseq toolkit' but does not specify its version number or other software dependencies with versions.
Experiment Setup Yes We set the dimension of all input and output layers to 512, the dimension of the inner feedforward neural network layer to 1024, and the total heads of all multi-head modules to 8 in both the encoder and decoder layers. The number of multi-hop reasoning N was set to 2 empirically. Each training batch consisted of a set of sentence pairs that contained approximately 4000 8 source tokens and 4000 8 target tokens. The value of label smoothing was set to 0.1, and the attention dropout and residual dropout were 0.1. We varied the learning rate under a warm-up strategy with warmup steps of 8,000. Following the training of 100,000 batches, we used a single model obtained by averaging the last five checkpoints, which validated the model with an interval of 2,000 batches on the dev set.