Local Translation Prediction with Global Sentence Representation

Authors: Jiajun Zhang, Dakun Zhang, Jie Hao

IJCAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The large-scale experiments show that our method can obtain substantial improvements in translation quality over the strong baseline: the hierarchical phrase-based translation model augmented with the neural network joint model.
Researcher Affiliation Collaboration Jiajun Zhang , Dakun Zhang and Jie Hao National Laboratory of Pattern Recognition, CASIA, Beijing, China Toshiba (China) R&D Center jjzhang@nlpr.ia.ac.cn, {zhangdakun,haojie}@toshiba.com.cn
Pseudocode No The paper includes mathematical formulations and architectural diagrams (Figure 2 and Figure 4) but no explicit pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statements about releasing source code or links to a code repository for the methodology described.
Open Datasets Yes The bilingual training data 3 from LDC contains about 2.1 million sentence pairs. This bilingual data is also utilized to train the two neural networks. The 5-gram language model is trained on the English part of the bilingual training data and the Xinhua portion of the English Gigaword corpus. (Footnote 3 lists specific LDC dataset IDs: LDC2000T50, LDC2002L27, LDC2003E07, LDC2003E14, LDC2004T07, LDC2005T06, LDC2005T10 and LDC2005T34).
Dataset Splits Yes NIST MT03 is used as the tuning data. MT05, MT06 and MT08 (news data) are used as the test data.
Hardware Specification No The paper does not provide specific hardware details such as GPU or CPU models, memory, or cloud instance types used for the experiments.
Software Dependencies No The paper mentions 'word2vec' and 'Noisy Contrastive Estimation (NCE)' but does not provide specific version numbers for these or any other software dependencies or libraries.
Experiment Setup Yes For the bilingually-constrained chunk-based CNN, the initial 192-dimensional word embeddings are trained with word2vec... We set the context window h = 3 for convolution. We will test multiple settings of the chunk number (C = 1, 2, 4, 8)... We apply L = 100 filters. The two fully connected linear layers both contain 192 neurons. The dropout ratio in the dropout layer is set 0.5 to prevent overfitting. The standard back-propagation and stochastic gradient descent (SGD) algorithm is utilized to optimize this network. For the feed-forward neural network, we also apply the SGD algorithm. In our experiments, following [Devlin et al., 2014] we use n = 4 and m = 11.