Asynchronous Bidirectional Decoding for Neural Machine Translation
Authors: Xiangwen Zhang, Jinsong Su, Yue Qin, Yang Liu, Rongrong Ji, Hongji Wang
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on NIST Chinese-English and WMT English-German translation tasks demonstrate that our model achieves substantial improvements over the conventional NMT by 3.14 and 1.38 BLEU points, respectively. We evaluated the proposed model on NIST Chinese-English and WMT English-German translation tasks. |
| Researcher Affiliation | Academia | Xiamen University, Xiamen, China1 Tsinghua University, Beijing, China2 |
| Pseudocode | No | The paper describes the model using mathematical equations and textual explanations but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The source code of this work can be obtained from https://github.com/DeepLearnXMU/ABDNMT. |
| Open Datasets | Yes | For Chinese-English translation, the training data consists of 1.25M bilingual sentences with 27.9M Chinese words and 34.5M English words. These sentence pairs are mainly extracted from LDC2002E18, LDC2003E07, LDC2003E14, Hansards portion of LDC2004T07, LDC2004T08 and LDC2005T06. For English-German translation, we used WMT 2015 training data that contains 4.46M sentence pairs with 116.1M English words and 108.9M German words. |
| Dataset Splits | Yes | We chose NIST 2002 (MT02) dataset as our development set, and the NIST 2003 (MT03), 2004 (MT04), 2005 (MT05), and 2006 (MT06) datasets as our test sets. The news-test 2013 was used as development set and the news-test 2015 as test set. |
| Hardware Specification | Yes | We used a single GPU device 1080Ti to train models. |
| Software Dependencies | No | The paper mentions using Gated Recurrent Unit (GRU), Rmsprop, and byte pair encoding (BPE) but does not provide specific version numbers for any software libraries or frameworks (e.g., TensorFlow, PyTorch, Caffe) used for implementation. |
| Experiment Setup | Yes | During this procedure, we set the following hyper-parameters: word embedding dimension as 620, hidden layer size as 1000, learning rate as 5 10 4, batch size as 80, gradient norm as 1.0, and dropout rate as 0.3. We set beam sizes of all above-mentioned models as 10, and the beam sizes of the backward and forward decoders of our model as 1 and 10, respectively. |