Syntactic Skeleton-Based Translation

Authors: Tong Xiao, Jingbo Zhu, Chunliang Zhang, Tongran Liu

AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We experimented with our approach on Chinese-English (zh-en) and English-Chinese (en-zh) translation tasks.Table 1 shows the result, where our syntactic skeleton-based system is abbreviated as SYNSKEL.
Researcher Affiliation Collaboration 1Northeastern University, Shenyang 110819, China 2Ya Trans Co., Ltd., Shenyang 110004, China 3Institute of Psychology (CAS) Beijing 100101, China
Pseudocode No The paper describes processes and rules in text and with examples (e.g., Figure 1 and 2), but it does not contain a formally labeled "Pseudocode" or "Algorithm" block.
Open Source Code No The paper mentions the "Niu Trans open-source toolkit (Xiao et al. 2012)" as a tool used in their work, but it does not state that the code specific to the SYNSKEL model proposed in this paper is open-source or available.
Open Datasets Yes We used 2.74 million sentence Chinese-English bitext from NIST12 Open MT. We trained two 5-gram language models: one on the Xinhua portion of the English Gigaword in addition to the English-side of the bitext, used by Chinese-English systems; one on the Xinhua portion of the Chinese Gigaword in addition to the Chinese-side of the bitext, used by English Chinese systems.
Dataset Splits Yes Our tuning sets (newswire: 1,198 sentences, web: 1,308 sentences) were drawn from the NIST MT 04-06 evaluation data and the GALE data. For English-Chinese translation, our tuning set (995 sentences) and test set (1,859 sentences) were the evaluation data sets of SSMT 07 and NIST MT 08 Chinese-English track, respectively.
Hardware Specification No The paper does not specify any particular hardware (e.g., GPU, CPU models, or memory) used for running the experiments. It only refers to the decoding process and system speed-up.
Software Dependencies No The paper mentions using the "Niu Trans open-source toolkit (Xiao et al. 2012)" but does not provide a version number for this toolkit or any other software dependencies like programming languages, libraries, or operating systems used in the experiments.
Experiment Setup Yes All feature weights were tuned using minimum error rate training (MERT). By default, string-parsing was used and ωs was set to + . We trained two 5-gram language models: one on the Xinhua portion of the English Gigaword in addition to the English-side of the bitext, used by Chinese-English systems; one on the Xinhua portion of the Chinese Gigaword in addition to the Chinese-side of the bitext, used by English Chinese systems. All language models were smoothed using the modified Kneser-Ney smoothing method. Syntax-based (tree-to-string) rules with up to five non-terminals were extracted on the entire set of the training data. For the hierarchical phrase-based system, hierarchial rules with up to two non-terminals were extracted from a 0.94 million sentence subset and phrasal rules were extracted from all the training data. To speed-up the system, we further prune the search space in several ways. First, we discard lexicalized, partially syntactic rules whose scope is larger than 3. Also, we discard non-lexicalized, partially syntactic rules with X non-terminals on the RHS only.