A Dependency-Based Neural Reordering Model for Statistical Machine Translation

Authors: Christian Hadiwinoto, Hwee Tou Ng109

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on Chinese-to-English translation show that our approach yields a statistically significant improvement of 0.57 BLEU point on benchmark NIST test sets, compared to our prior state-of-the-art statistical MT system that uses sparse dependency-based reordering features.
Researcher Affiliation Academia Christian Hadiwinoto, Hwee Tou Ng Department of Computer Science National University of Singapore {chrhad, nght}@comp.nus.edu.sg
Pseudocode No The paper describes the neural network architecture and formulas but does not provide pseudocode or a clearly labeled algorithm block.
Open Source Code No The paper does not provide any statement or link indicating the release of open-source code for the described methodology.
Open Datasets Yes Our parallel training corpora are from LDC, which we divide into older corpora1 and newer corpora2. ... 1LDC2002E18, LDC2003E14, LDC2004E12, LDC2004T08, LDC2005T06, and LDC2005T10. 2LDC2007T23, LDC2008T06, LDC2008T08, LDC2008T18, LDC2009T02, LDC2009T06, LDC2009T15, LDC2010T03, LDC2013T11, LDC2013T16, LDC2014T04, LDC2014T11, LDC2014T15, LDC2014T20, and LDC2014T26. ... To train the Chinese word embeddings as described above, we concatenate the Chinese side of our parallel texts with Chinese Gigaword version 5 (LDC2011T13) ... The language model (LM) is a 5-gram model trained on the English side of the FBIS parallel corpus (LDC2003E14) and the monolingual corpus English Gigaword version 4 (LDC2009T13) ... Training the neural reordering classifier involves LDC manually-aligned corpora, from which we extracted 572K head-child pairs and 1M sibling pairs as training instances6 ... 6LDC2012T20, LDC2012T24, LDC2013T05, LDC2013T23, LDC2014T25, LDC2015T04, and LDC2015T18.
Dataset Splits Yes Training the neural reordering classifier involves LDC manually-aligned corpora, from which we extracted 572K head-child pairs and 1M sibling pairs as training instances6, while retaining 90,233 head-child pairs and 146,112 sibling pairs as held-out tuning instances7. The latter is used to pick the best neural network parameters. Our translation development set is MTC corpus version 1 (LDC2002T01) and version 3 (LDC2004T07).
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments (e.g., CPU/GPU models, memory specifications).
Software Dependencies No The paper mentions software like Moses, GIZA++, and Mate parser, but does not provide specific version numbers for these or any other ancillary software components.
Experiment Setup Yes We set the word vocabulary to the 100,000 most frequent words in our parallel training corpora, replacing other words with a special UNK token, in addition to all POS tags, dependency labels, and Boolean features. We set the embedding dimension size to 100, the lower hidden layer dimension size to 200, and the upper hidden layer dimension size to 100. We trained for 100 epochs, with 128 mini-batches per epoch, and used a dropout rate of 0.5. For model ensemble, we trained 10 classifiers for head-child reordering and 10 for sibling reordering, each of which forming one feature function.