Effective Graph Context Representation for Document-level Machine Translation
Authors: Kehai Chen, Muyun Yang, Masao Utiyama, Eiichiro Sumita, Rui Wang, Min Zhang
IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on several widely-used document-level benchmarks demonstrated the effectiveness of the proposed approach. |
| Researcher Affiliation | Academia | 1Harbin Institute of Technology, Shenzhen, China 2Harbin Institute of Technology, Harbin, China 3National Institute of Information and Communications Technology, Kyoto, Japan 4Shanghai Jiao Tong University, Shanghai, China |
| Pseudocode | No | The paper describes the methods using mathematical formulas and text, but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement or link for the open-source code of the described methodology. |
| Open Datasets | Yes | For TED Talks in IWSLT17, we used dev-2010 as the development set, and test-2010/2011/2012/2013 as the test sets for both Chinese English (Zh-En) and English-German (En-De) language pairs. For News-Commentary v14 (News), we use the newstest2017 for development and newstest2018 to test both Zh-En and En De language pairs. Our approach was also evaluated on a large scale corpus Euro extracted from Europarl v7 [Maruf et al., 2019]. |
| Dataset Splits | Yes | For TED Talks in IWSLT17, we used dev-2010 as the development set, and test-2010/2011/2012/2013 as the test sets for both Chinese English (Zh-En) and English-German (En-De) language pairs. ... Following the training of 100,000 batches, we used a single model obtained by averaging the last five checkpoints, which validated the model with an interval of 2,000 batches on the dev set. |
| Hardware Specification | Yes | We trained all models on eight V100 GPUs and evaluated them on a single V100 GPU. |
| Software Dependencies | No | The paper mentions using the 'fairseq toolkit' but does not specify its version number or other software dependencies with versions. |
| Experiment Setup | Yes | We set the dimension of all input and output layers to 512, the dimension of the inner feedforward neural network layer to 1024, and the total heads of all multi-head modules to 8 in both the encoder and decoder layers. The number of multi-hop reasoning N was set to 2 empirically. Each training batch consisted of a set of sentence pairs that contained approximately 4000 8 source tokens and 4000 8 target tokens. The value of label smoothing was set to 0.1, and the attention dropout and residual dropout were 0.1. We varied the learning rate under a warm-up strategy with warmup steps of 8,000. Following the training of 100,000 batches, we used a single model obtained by averaging the last five checkpoints, which validated the model with an interval of 2,000 batches on the dev set. |