Translation Prediction with Source Dependency-Based Context Representation

Authors: Kehai Chen, Tiejun Zhao, Muyun Yang, Lemao Liu

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Examined by a large-scale Chinese-English translation task, the proposed approach achieves a significant improvement (of up to +1.9 BLEU points) over the baseline system, and meanwhile outperforms a number of context-enhanced comparison system.
Researcher Affiliation Academia 1Machine Intelligence and Translation Laboratory, Harbin Institute of Technology, Harbin, China 2ASTREC, National Institute of Information and Communications Technology, Kyoto, Japan
Pseudocode No The paper describes the model architecture and training process using mathematical equations and textual descriptions but does not include explicit pseudocode or algorithm blocks.
Open Source Code No The paper mentions using external toolkits like Moses, SRILM, GIZA++, Stanford dependency parser, and word2vec, but does not provide access to the source code for the methodology described in this paper.
Open Datasets Yes The training data contains 1.46 million sentence pairs from the LDC dataset4. 4LDC2002E18, LDC2003E07, LDC2003E14, Hansards portion of LDC2004T07, LDC2004T08 and LDC2005T06.
Dataset Splits Yes The Minimum error rate training (MERT) (Och 2003) was used to optimize the feature weights on the NIST02 test set, and test on the NIST03/NIST04/NIST05 test set.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU, GPU models) used for running the experiments.
Software Dependencies No The paper mentions software like Moses, srilm toolkit, GIZA++, Stanford dependency parser, and word2vec toolkit, but does not provide specific version numbers for these tools (e.g., "srilm toolkit 3" refers to footnote 3, not version 3).
Experiment Setup Yes Most models had a vocabulary size of 50k. We used word2vec toolkit to generate each word (100 dimensions) for historical DBi CUs, and each word (500 dimensions) for predicted DBi CU. These parameters were optimized by 10 epochs of stochastic gradient descent, using minibatch size 500 and a learning rate of 1.