Efficient Context-Aware Neural Machine Translation with Layer-Wise Weighting and Input-Aware Gating
Authors: Hongfei Xu, Deyi Xiong, Josef van Genabith, Qiuhui Liu
IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conducted experiments to validate the performance and efficiency of our approach. Our approach was implemented based on the Neutron implementation [Xu and Liu, 2019] of the Transformer translation model. To compare with Voita et al. [2019c], we used the same corpus which is based on the publicly available Open Subtitles2018 corpus [Lison et al., 2018] for English and Russian. The corpus consists of 6 million training instances, among which 1.5 million have contexts of three sentences. We also compared our approach with Zhang et al. [2018]. In addition to tokenized BLEU, we also performed linguistic evaluations on the contrastive test sets [Voita et al., 2019c] which are specifically designed to test the ability of a system to adapt to contextual information in handling frequent discourse phenomena (i.e., deixis, lexical cohesion, VP and inflection ellipses) in context-aware translation. |
| Researcher Affiliation | Collaboration | Hongfei Xu1,2 , Deyi Xiong3 , Josef van Genabith1,2 and Qiuhui Liu4 1Saarland University, Germany 2German Research Center for Artificial Intelligence, Germany 3Tianjin University, China 4China Mobile Online Services, China |
| Pseudocode | No | The paper includes architectural diagrams (Figure 1 and 2) but does not provide any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states: "Our approach was implemented based on the Neutron implementation [Xu and Liu, 2019] of the Transformer translation model." While Neutron itself is open source, the paper does not explicitly provide a link or statement for the specific code implementing the 'Efficient Context-Aware Neural Machine Translation with Layer-Wise Weighting and Input-Aware Gating' described in this paper. |
| Open Datasets | Yes | To compare with Voita et al. [2019c], we used the same corpus which is based on the publicly available Open Subtitles2018 corpus [Lison et al., 2018] for English and Russian. |
| Dataset Splits | No | The paper mentions the use of the Open Subtitles2018 corpus and specifies the total number of training instances (6 million), but it does not explicitly provide the train/validation/test dataset splits (e.g., percentages or sample counts) for the main NMT task. It refers to 'contrastive test sets' for linguistic evaluations without specifying their size or how they relate to the main corpus splits. |
| Hardware Specification | Yes | All models were trained on 2 GTX 1080 Ti GPUs, and translation was performed on 1 GPU. |
| Software Dependencies | No | The paper mentions implementation based on "Neutron" but does not provide specific version numbers for any software dependencies or libraries (e.g., Python, PyTorch, TensorFlow, CUDA). |
| Experiment Setup | Yes | For fairness, we followed the setting of Voita et al. [2019c] to use 3 previous context sentences. Corresponding to randomly masking 20% tokens of 50% sentences with random tokens, we used a token dropout of 0.1. We employed h = 8 parallel attention heads. The dimension of input and output (dmodel) was 512, and the hidden dimension of feed-forward networks was 2048. We used 0.1 as the dropout probability and the label smoothing value. For the Adam optimizer, we used 0.9, 0.98 and 10 9 as β1, β2 and ϵ, and all context-aware models were trained for 200k training steps following Voita et al. [2019c]. |