Modeling Voting for System Combination in Machine Translation
Authors: Xuancheng Huang, Jiacheng Zhang, Zhixing Tan, Derek F. Wong, Huanbo Luan, Jingfang Xu, Maosong Sun, Yang Liu
IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that our approach is capable of better taking advantage of the consensus between hypotheses and achieves significant improvements over state-of-the-art baselines on Chinese-English and English-German machine translation tasks. 1 |
| Researcher Affiliation | Collaboration | Xuancheng Huang1 , Jiacheng Zhang1 , Zhixing Tan1 , Derek F. Wong2 , Huanbo Luan1 , Jingfang Xu3 , Maosong Sun1 and Yang Liu1,4,5 1Dept. of Comp. Sci. & Tech., BNRist Center, Institute for AI, Tsinghua University 2NLP2CT Lab / Department of Computer and Information Science, University of Macau 3Sogou Inc. 4Beijing Advanced Innovation Center for Language Resources 5Beijing Academy of Artificial Intelligence |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | Yes | We release our source code at Github: https://github.com/ THUNLP-MT/Voting4SC |
| Open Datasets | Yes | For the Chinese English task, the training set contains about 1.25M sentence pairs from LDC with 27.9M Chinese words and 34.5M English words. 4 We used the NIST 2006 dataset as the development set. The NIST 2002, 2003, 2004, 2005, and 2008 datasets were used as test sets. For the English-German task, the training set is the WMT 2014 training data with 4.5M sentence pairs, the validation set is newstest2013, and the test set is newstest2014. 4The training set includes LDC2002E18, LDC2003E07, LDC2003E14, part of LDC2004T07, LDC2004T08 and LDC2005T06. |
| Dataset Splits | Yes | For the Chinese English task, the training set contains about 1.25M sentence pairs from LDC with 27.9M Chinese words and 34.5M English words. 4 We used the NIST 2006 dataset as the development set. ... For the English-German task, the training set is the WMT 2014 training data with 4.5M sentence pairs, the validation set is newstest2013, and the test set is newstest2014. |
| Hardware Specification | No | The paper does not provide any specific hardware details used for running the experiments. |
| Software Dependencies | No | The paper mentions software like Adam [Kingma and Ba, 2014], Transformer [Vaswani et al., 2017], and the THUMT toolkit, but it does not specify version numbers for any software dependencies required to replicate the experiment. |
| Experiment Setup | Yes | We used the same hyper-parameter setting for both baselines and our approach. The number of layers was set to 6 for both encoders and decoder. The hidden size was set to 512 and the filter size was set to 2,048. The number of individual attention heads was set to 8 for multi-head attention. We tied all three src, hyp, trg embeddings for the English-German task. The embeddings and softmax weights were tied for both language pairs. In training, we used Adam [Kingma and Ba, 2014] for optimization. Each mini-batch contains 19K tokens for the Chinese-English task and 25K tokens for the English German task. We used the learning rate decay policy described by [Vaswani et al., 2017]. In decoding, the beam size was set to 4 for both language pairs and the length penalty was set to 1.0 and 0.6 for Chinese-English and English-German, respectively. The other hyper-parameter settings were the same as the Transformer-base model [Vaswani et al., 2017]. |