Neural Machine Translation with Universal Visual Representation

Authors: Zhuosheng Zhang, Kehai Chen, Rui Wang, Masao Utiyama, Eiichiro Sumita, Zuchao Li, Hai Zhao

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on four widely used translation datasets, including the WMT 16 English-to-Romanian, WMT 14 English-to-German, WMT 14 English-to-French, and Multi30K, show that the proposed approach achieves significant improvements over strong baselines.
Researcher Affiliation Academia 1Department of Computer Science and Engineering, Shanghai Jiao Tong University 2Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai Jiao Tong University, Shanghai, China 3Mo E Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University 4National Institute of Information and Communications Technology (NICT), Kyoto, Japan
Pseudocode Yes Algorithm 1 Topic-image Lookup Table Conversion Algorithm
Open Source Code Yes 1The code is publicly available at https://github.com/cooelf/UVR-NMT.
Open Datasets Yes The Multi30K dataset contains 29K English {German, French} parallel sentence pairs with visual annotations. For the EN-RO task, we experimented with the officially provided parallel corpus: Europarl v7 and SETIMES2 from WMT 16 with 0.6M sentence pairs. For the EN-DE translation task, 4.43M bilingual sentence pairs of the WMT14 dataset were used as training data, including Common Crawl, News Commentary, and Europarl v7. For the EN-FR translation task, 36M bilingual sentence pairs from the WMT14 dataset were used as training data.
Dataset Splits Yes We used newsdev2016 as the dev set and newstest2016 as the test set. The newstest2013 and newstest2014 datasets were used as the dev set and test set, respectively. Newstest12 and newstest13 were combined for validation and newstest14 was used as the test set, following the setting of Gehring et al. (2017). The 1,014 English {German, French} sentence pairs visual annotations are as dev set.
Hardware Specification Yes All models were trained and evaluated on a single V100 GPU.
Software Dependencies Yes Multi-bleu.perl8 was used to compute case-sensitive 4-gram BLEU scores for all test sets. https://github.com/moses-smt/mosesdecoder/tree/RELEASE-4.0/scripts/ generic/multi-bleu.perl
Experiment Setup Yes The number of dimensions of all input and output layers was set to 512 and 1024 for base and big models. The inner feed-forward neural network layer was set to 2048. The heads of all multi-head modules were set to eight in both encoder and decoder layers. The byte pair encoding algorithm was adopted, and the size of the vocabulary was set to 40,000. In each training batch, a set of sentence pairs contained approximately 4096 * 4 source tokens and 4096 * 4 target tokens. During training, the value of label smoothing was set to 0.1, and the attention dropout and residual dropout were p = 0.1. We used Adam optimizer (Kingma & Ba, 2014) to tune the parameters of the model. The learning rate was varied under a warm-up strategy with 8,000 steps. For evaluation, we validated the model with an interval of 1,000 batches on the dev set. For the Multi30K dataset, we trained the model up to 10,000 steps, and the training was early-stopped if dev set BLEU score did not improve for ten epochs. For the ENDE, EN-RO, and EN-FR tasks, following the training of 200,000 batches, the model with the highest BLEU score of the dev set was selected to evaluate the test sets. During the decoding, the beam size was set to five.