reproducibilityindex.ai

Finding Sparse Structures for Domain Specific Neural Machine Translation

Authors: Jianze Liang, Chengqi Zhao, Mingxuan Wang, Xipeng Qiu, Lei Li13333-13342

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical experiment results show that PRUNE-TUNE outperforms several strong competitors in the target domain test set without sacriﬁcing the quality on the general domain in both single and multi-domain settings.
Researcher Affiliation	Collaboration	Jianze Liang, 1,2 Chengqi Zhao, 2 Mingxuan Wang, 2 Xipeng Qiu, 1 Lei Li 2 1 School of Computer Science, Fudan University, Shanghai, China 2 Byte Dance AI Lab, China
Pseudocode	No	The paper does not contain any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	The source code and data are available at https://github.com/ohlionel/Prune-Tune.
Open Datasets	Yes	For TED talks, we used IWSLT14 as training corpus... For the biomedicine domain, we evaluated on EMEA News Crawl dataset1. As there were no ofﬁcial validation and test set for EMEA, we used Khresmoi Medical Summary Translation Test Data 2.02. For novel domain, we used a book dataset from OPUS3 (Tiedemann 2012)... For ZH EN, we used the training corpora from WMT19 ZH EN translation task as the general domain data. We selected 6 target domain datasets from from UM-Corpus4 (Tian et al. 2014).
Dataset Splits	Yes	Direction Corpus Train Dev. Test WMT14 3.9M 3000 3003 IWSLT14 170k 6750 1305 EMEA 587k 500 1000 Novel 50k 1015 1031 WMT19 20M 3000 3981 Laws 220k 800 456 Thesis 300k 800 625 Subtitles 300k 800 598 Education 449K 800 791 News 449K 800 1500 Spoken 219k 800 456
Hardware Specification	Yes	All models were trained with a global batch size of 32,768 on NVIDIA Tesla V100 GPUs.
Software Dependencies	No	The paper mentions software like 'sentencepiece', 'jieba', 'moses tokenizer', 'byte pair encoding (BPE)', 'Transformer', 'Adam optimizer', and 'multi-bleu.perl5' but generally does not provide specific version numbers for these software dependencies, except possibly implied 'multi-bleu.perl5'.
Experiment Setup	Yes	The embedding dimension was 1,024 and the size of ffn hidden units was 4,096. The attention head was set to 16 for both self-attention and cross-attention. We used Adam optimizer (Kingma and Ba 2015) with the same schedule algorithm as Vaswani et al. (2017). All models were trained with a global batch size of 32,768... During inference, we used a beam width of 4 for both EN DE and ZH EN and we set the length penalty to 0.6 for EN DE, 1.0 for ZH EN.