Improving Context-Aware Neural Machine Translation Using Self-Attentive Sentence Embedding

Authors: Hyeongu Yun, Yongkeun Hwang, Kyomin Jung9498-9506

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through extensive experiments, we observe that our HCE records the best performance measured in BLEU score on English-German, English-Turkish, and English-Korean corpus. In addition, we observe that our HCE records the best performance in a crowd-sourced test set which is designed to evaluate how well an encoder can exploit contextual information. Overall BLEU scores on all eight datasets are displayed in Table 2. Our model yields the best performances on all eight datasets.
Researcher Affiliation Academia Hyeongu Yun,1 Yongkeun Hwang,1 Kyomin Jung1,2 1Seoul National University, Seoul, Korea 2Automation and Systems Research Institute, Seoul National University, Seoul, Korea {youaredead, wangcho2k, kjung}@snu.ac.kr
Pseudocode No The paper describes the model architecture and components but does not include any explicit pseudocode or algorithm blocks.
Open Source Code No The paper states, "We plan to release both the crowd-sourced evaluation set and the pronoun resolution test suite," which refers to data. It mentions using the "tensor2tensor framework" and "t2t-bleu script" but does not explicitly state that the authors' own implementation code for the proposed method is open-source or provide a link to it.
Open Datasets Yes We use the English-German corpus from the IWSLT 2017 evaluation campaign (Cettolo et al. 2017), which is publicly available on WIT3 website. We also choose the Open Subtitles corpus for English German and English-Turkish tasks. We use the 2018 version (Lison, Tiedemann, and Kouylekov 2018) of the data.
Dataset Splits Yes We combine dev2010 and tst2010 into a development(dev) set and tst2015 as a test set. The resulting dataset consists of 211k, 2.4k, 1.1k examples of train, dev, test sets respectively. The train set includes 3.0M sentences, the dev set includes 28.8k sentences, and the test set includes 31.1k sentences.
Hardware Specification No The paper does not provide specific details about the hardware used for running experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies No The paper mentions using the "tensor2tensor framework" and the "t2t-bleu script" and "Moses script", but it does not provide specific version numbers for these software components.
Experiment Setup Yes Through our experiments, we use 512 hidden dimensions for all layers including words embedding layers, FAN layers, and the encoded context layer. We set NLayer = 6 for all models and share the weights of the source encoder to context encoder for the DAT, HAN, and HCE models. For all attention mechanisms, we set the number of heads as 8. The dropout rate of each FAN layers is set to 0.1. We train all models with ADAM (Kingma and Ba 2014) optimizer with learning rate 1e-3 and adopt early stopping with dev loss.