Neural Machine Translation with Joint Representation

Authors: Yanyang Li, Qiang Wang, Tong Xiao, Tongran Liu, Jingbo Zhu8285-8292

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our experiments, the Reformer models achieve 1.3, 0.8 and 0.7 BLEU point improvement over the Transformer baseline in the small scale IWSLT15 Vietnamese-English and IWSLT14 German-English, English-German datasets as well as 1.9 in the large scale NIST12 Chinese-English dataset.
Researcher Affiliation Collaboration Yanyang Li,1 Qiang Wang,1 Tong Xiao,1,2 Tongran Liu,3 Jingbo Zhu1,2 1Natural Language Processing Lab., Northeastern University, Shenyang, China 2Niu Trans Co., Ltd., Shenyang, China 3CAS Key Laboratory of Behavioral Science, Institute of Psychology, CAS, Beijing, China
Pseudocode No No, the paper does not contain any clearly labeled pseudocode or algorithm blocks, nor are there any structured steps formatted in a code-like manner.
Open Source Code Yes The code is publicly available at https://github.com/lyy1994/reformer.
Open Datasets Yes We evaluated our approach on IWSLT15 Vietnamese-English (Vi-En), IWSLT14 German-English (De-En), English-German (En-De) and NIST12 Chinese English (Zh-En) translation tasks. For Vi-En translation, the training set consisted of 130K sentence pairs and we used tst2012 as the validation set and tst2013 as the test set. For De-En and En-De translations, the training set consisted of 160K sentence pairs and we randomly drew 7K samples from the training set as the validation set. We concatenated dev2010, dev2012, tst2010, tst2011 and tst2012 as the test set. For Zh-En translation, We used 1.8M sentence Chinese-English bitext provided within NIST12 Open MT1.
Dataset Splits Yes For Vi-En translation, the training set consisted of 130K sentence pairs and we used tst2012 as the validation set and tst2013 as the test set. For De-En and En-De translations, the training set consisted of 160K sentence pairs and we randomly drew 7K samples from the training set as the validation set. We concatenated dev2010, dev2012, tst2010, tst2011 and tst2012 as the test set. For Zh-En translation, We used 1.8M sentence Chinese-English bitext provided within NIST12 Open MT1. We chose the evaluation data of NIST MT06 as the validation set, and MT05, MT08 as the test set.
Hardware Specification Yes All experiments were done on 8 Titan V GPUs with the half-precision training.
Software Dependencies No No, the paper mentions using the 'open-source implementation of the Transformer model (Ott et al. 2019)' (which refers to fairseq), the 'Adam optimizer', and 'BPE', but it does not specify version numbers for any software libraries, toolkits, or dependencies used in the experiments.
Experiment Setup Yes The model consisted of the 6-layer encoder and decoder. The size of the embedding, the heads and hidden layer of the feed-forward network were set to 256/512, 4/8 and 1024/2048 for the IWSLT/NIST datasets. Dropout was set to 0.1 for all experiments. For training, we used the Adam optimizer (Kingma and Ba 2015) where the learning rate and batch size were set to 0.0007 and 4096 8 tokens.