Improving Stylized Neural Machine Translation with Iterative Dual Knowledge Transfer
Authors: Xuanxuan Wu, Jian Liu, Xinjie Li, Jinan Xu, Yufeng Chen, Yujie Zhang, Hui Huang
IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiment results demonstrate the effectiveness of our method, achieving an improvement over the existing best model by 5 BLEU points on MTFC dataset. We conduct experiments on two benchmark datasets, MTFC and GYAFC, and achieves state-of-the-art results in producing stylized translation sentences based on both automatic and human evaluation. |
| Researcher Affiliation | Collaboration | 1Beijing Jiaotong University, Beijing, China 2Global Tone Communication Technology Co., Ltd., Beijing, China |
| Pseudocode | Yes | Algorithm 1 Iterative Dual Knowledge Transfer for Improving Stylized NMT |
| Open Source Code | Yes | Code and data are available at https://github.com/mt887/IDKT |
| Open Datasets | Yes | We use two datasets to evaluate our proposed method, the size of each training dataset is presented in Table 2. MTFC. Machine Translation Formality Corpus (MTFC)... GYAFC. We use Grammarly s Yahoo Corpus Dataset (GYAFC)... Code and data are available at https://github.com/mt887/IDKT |
| Dataset Splits | Yes | The size of each training dataset is presented in Table 2. Dataset Train Valid Test GYAFC (E&M) 52k 2877 1416 GYAFC (F&R) 52k 2788 1432 MTFC 14280k 2877 1416 |
| Hardware Specification | Yes | We train our models with Adam [Kingma and Ba, 2015] optimizer using β1=0.9 β2=0.98 on 2 NVIDIA 2080Ti GPUs. |
| Software Dependencies | No | The paper mentions software tools like Fairseq, BART-large, and BERT, but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | The dimensionality of all input and output layers is 1024, and that of FFN layer is 4096. Both the encoder and decoder have 6 layers with 8 attention heads. We train our models with Adam [Kingma and Ba, 2015] optimizer using β1=0.9 β2=0.98. |