Towards Zero Unknown Word in Neural Machine Translation

Authors: Xiaoqing Li, Jiajun Zhang, Chengqing Zong

IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on Chinese-to-English translation demonstrate that our proposed method can achieve more than 4 BLEU points over the attention-based NMT.
Researcher Affiliation Academia National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences CAS Center for Excellence in Brain Science and Intelligence Technology
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any concrete access to source code for the methodology described.
Open Datasets Yes The bilingual data to train the NMT model is selected from LDC, which contains about 0.6M sentence pairs. ... We use the word2vec toolkit [Mikolov et al., 2013] to train word vectors on the monolingual data, which is the combination of the source side of the bilingual data and Chinese Giagaword Xinhua portion. ... the English language model is trained on the combination of the target side of the bilingual data and the English Gigaword.
Dataset Splits Yes The NIST 03 dataset is chosen as the development set, which is used to monitoring the training process and decide the early stop condition.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running its experiments.
Software Dependencies No The paper mentions software tools like 'Berkeley Aligner', 'word2vec toolkit', and 'kenlm', but does not provide specific version numbers for them.
Experiment Setup Yes We limit both the source and target vocabulary to 30k in our experiments. This number of hidden units is 1,000 for both the encoder and decoder. And the word embedding dimension is 500 for all source and target words. The parameters in the network are updated with the adadelta algorithm.