Local Translation Prediction with Global Sentence Representation
Authors: Jiajun Zhang, Dakun Zhang, Jie Hao
IJCAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The large-scale experiments show that our method can obtain substantial improvements in translation quality over the strong baseline: the hierarchical phrase-based translation model augmented with the neural network joint model. |
| Researcher Affiliation | Collaboration | Jiajun Zhang , Dakun Zhang and Jie Hao National Laboratory of Pattern Recognition, CASIA, Beijing, China Toshiba (China) R&D Center jjzhang@nlpr.ia.ac.cn, {zhangdakun,haojie}@toshiba.com.cn |
| Pseudocode | No | The paper includes mathematical formulations and architectural diagrams (Figure 2 and Figure 4) but no explicit pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code or links to a code repository for the methodology described. |
| Open Datasets | Yes | The bilingual training data 3 from LDC contains about 2.1 million sentence pairs. This bilingual data is also utilized to train the two neural networks. The 5-gram language model is trained on the English part of the bilingual training data and the Xinhua portion of the English Gigaword corpus. (Footnote 3 lists specific LDC dataset IDs: LDC2000T50, LDC2002L27, LDC2003E07, LDC2003E14, LDC2004T07, LDC2005T06, LDC2005T10 and LDC2005T34). |
| Dataset Splits | Yes | NIST MT03 is used as the tuning data. MT05, MT06 and MT08 (news data) are used as the test data. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models, memory, or cloud instance types used for the experiments. |
| Software Dependencies | No | The paper mentions 'word2vec' and 'Noisy Contrastive Estimation (NCE)' but does not provide specific version numbers for these or any other software dependencies or libraries. |
| Experiment Setup | Yes | For the bilingually-constrained chunk-based CNN, the initial 192-dimensional word embeddings are trained with word2vec... We set the context window h = 3 for convolution. We will test multiple settings of the chunk number (C = 1, 2, 4, 8)... We apply L = 100 filters. The two fully connected linear layers both contain 192 neurons. The dropout ratio in the dropout layer is set 0.5 to prevent overfitting. The standard back-propagation and stochastic gradient descent (SGD) algorithm is utilized to optimize this network. For the feed-forward neural network, we also apply the SGD algorithm. In our experiments, following [Devlin et al., 2014] we use n = 4 and m = 11. |