Finding Sparse Structures for Domain Specific Neural Machine Translation
Authors: Jianze Liang, Chengqi Zhao, Mingxuan Wang, Xipeng Qiu, Lei Li13333-13342
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical experiment results show that PRUNE-TUNE outperforms several strong competitors in the target domain test set without sacrificing the quality on the general domain in both single and multi-domain settings. |
| Researcher Affiliation | Collaboration | Jianze Liang, 1,2 Chengqi Zhao, 2 Mingxuan Wang, 2 Xipeng Qiu, 1 Lei Li 2 1 School of Computer Science, Fudan University, Shanghai, China 2 Byte Dance AI Lab, China |
| Pseudocode | No | The paper does not contain any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The source code and data are available at https://github.com/ohlionel/Prune-Tune. |
| Open Datasets | Yes | For TED talks, we used IWSLT14 as training corpus... For the biomedicine domain, we evaluated on EMEA News Crawl dataset1. As there were no official validation and test set for EMEA, we used Khresmoi Medical Summary Translation Test Data 2.02. For novel domain, we used a book dataset from OPUS3 (Tiedemann 2012)... For ZH EN, we used the training corpora from WMT19 ZH EN translation task as the general domain data. We selected 6 target domain datasets from from UM-Corpus4 (Tian et al. 2014). |
| Dataset Splits | Yes | Direction Corpus Train Dev. Test WMT14 3.9M 3000 3003 IWSLT14 170k 6750 1305 EMEA 587k 500 1000 Novel 50k 1015 1031 WMT19 20M 3000 3981 Laws 220k 800 456 Thesis 300k 800 625 Subtitles 300k 800 598 Education 449K 800 791 News 449K 800 1500 Spoken 219k 800 456 |
| Hardware Specification | Yes | All models were trained with a global batch size of 32,768 on NVIDIA Tesla V100 GPUs. |
| Software Dependencies | No | The paper mentions software like 'sentencepiece', 'jieba', 'moses tokenizer', 'byte pair encoding (BPE)', 'Transformer', 'Adam optimizer', and 'multi-bleu.perl5' but generally does not provide specific version numbers for these software dependencies, except possibly implied 'multi-bleu.perl5'. |
| Experiment Setup | Yes | The embedding dimension was 1,024 and the size of ffn hidden units was 4,096. The attention head was set to 16 for both self-attention and cross-attention. We used Adam optimizer (Kingma and Ba 2015) with the same schedule algorithm as Vaswani et al. (2017). All models were trained with a global batch size of 32,768... During inference, we used a beam width of 4 for both EN DE and ZH EN and we set the length penalty to 0.6 for EN DE, 1.0 for ZH EN. |