Agreement-Based Joint Training for Bidirectional Attention-Based Neural Machine Translation
Authors: Yong Cheng, Shiqi Shen, Zhongjun He, Wei He, Hua Wu, Maosong Sun, Yang Liu
IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on Chinese English and English-French translation tasks show that agreement-based joint training significantly improves both alignment and translation quality over independent training. |
| Researcher Affiliation | Collaboration | #Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China; +Baidu Inc., Beijing, China; State Key Laboratory of Intelligent Technology and Systems Tsinghua National Laboratory for Information Science and Technology Department of Computer Science and Technology, Tsinghua University, Beijing, China |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any concrete access to source code for the methodology described. |
| Open Datasets | Yes | For Chinese-English, the training corpus from LDC consists of 2.56M sentence pairs... For English-French, the training corpus from WMT 2014 consists of 12.07M sentence pairs... |
| Dataset Splits | Yes | We used the NIST 2006 dataset as the validation set for hyper-parameter optimization and model selection. The NIST 2002, 2003, 2004, 2005, and 2008 datasets were used as test sets. The concatenation of news-test-2012 and news-test-2013 was used as the validation set and news-test-2014 as the test set. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions software like MOSES, RNNSEARCH, and SRILM, but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | The vocabulary size is set to 30K for all languages. The hyper-parameter λ that balances the preference between likelihood and agreement is set to 1.0 for Chinese-English and 2.0 for English-French. |