Syntactic Skeleton-Based Translation
Authors: Tong Xiao, Jingbo Zhu, Chunliang Zhang, Tongran Liu
AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experimented with our approach on Chinese-English (zh-en) and English-Chinese (en-zh) translation tasks.Table 1 shows the result, where our syntactic skeleton-based system is abbreviated as SYNSKEL. |
| Researcher Affiliation | Collaboration | 1Northeastern University, Shenyang 110819, China 2Ya Trans Co., Ltd., Shenyang 110004, China 3Institute of Psychology (CAS) Beijing 100101, China |
| Pseudocode | No | The paper describes processes and rules in text and with examples (e.g., Figure 1 and 2), but it does not contain a formally labeled "Pseudocode" or "Algorithm" block. |
| Open Source Code | No | The paper mentions the "Niu Trans open-source toolkit (Xiao et al. 2012)" as a tool used in their work, but it does not state that the code specific to the SYNSKEL model proposed in this paper is open-source or available. |
| Open Datasets | Yes | We used 2.74 million sentence Chinese-English bitext from NIST12 Open MT. We trained two 5-gram language models: one on the Xinhua portion of the English Gigaword in addition to the English-side of the bitext, used by Chinese-English systems; one on the Xinhua portion of the Chinese Gigaword in addition to the Chinese-side of the bitext, used by English Chinese systems. |
| Dataset Splits | Yes | Our tuning sets (newswire: 1,198 sentences, web: 1,308 sentences) were drawn from the NIST MT 04-06 evaluation data and the GALE data. For English-Chinese translation, our tuning set (995 sentences) and test set (1,859 sentences) were the evaluation data sets of SSMT 07 and NIST MT 08 Chinese-English track, respectively. |
| Hardware Specification | No | The paper does not specify any particular hardware (e.g., GPU, CPU models, or memory) used for running the experiments. It only refers to the decoding process and system speed-up. |
| Software Dependencies | No | The paper mentions using the "Niu Trans open-source toolkit (Xiao et al. 2012)" but does not provide a version number for this toolkit or any other software dependencies like programming languages, libraries, or operating systems used in the experiments. |
| Experiment Setup | Yes | All feature weights were tuned using minimum error rate training (MERT). By default, string-parsing was used and ωs was set to + . We trained two 5-gram language models: one on the Xinhua portion of the English Gigaword in addition to the English-side of the bitext, used by Chinese-English systems; one on the Xinhua portion of the Chinese Gigaword in addition to the Chinese-side of the bitext, used by English Chinese systems. All language models were smoothed using the modified Kneser-Ney smoothing method. Syntax-based (tree-to-string) rules with up to five non-terminals were extracted on the entire set of the training data. For the hierarchical phrase-based system, hierarchial rules with up to two non-terminals were extracted from a 0.94 million sentence subset and phrasal rules were extracted from all the training data. To speed-up the system, we further prune the search space in several ways. First, we discard lexicalized, partially syntactic rules whose scope is larger than 3. Also, we discard non-lexicalized, partially syntactic rules with X non-terminals on the RHS only. |