Addressing the Under-Translation Problem from the Entropy Perspective

Authors: Yang Zhao, Jiajun Zhang, Chengqing Zong, Zhongjun He, Hua Wu451-458

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on various translation tasks show that our method can significantly improve the translation quality and substantially reduce the under-translation cases of high-entropy words.
Researcher Affiliation Collaboration Yang Zhao,1,2 Jiajun Zhang,1,2 Chengqing Zong,1,2,3 Zhongjun He,4 and Hua Wu4 1National Laboratory of Pattern Recognition, Institute of Automation, CAS, Beijing, China 2University of Chinese Academy of Sciences, Beijing, China 3CAS Center for Excellence in Brain Science and Intelligence Technology, Beijing, China 4Baidu Inc., Beijing, China
Pseudocode No The paper describes its methods using text and mathematical equations, but it does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code No The paper does not provide any statement about releasing the source code for the methodology described, nor does it include links to a code repository.
Open Datasets Yes We test the proposed methods on Chinese-to-English (CH-EN), English-to-Japanese (EN-JA) and English-to-German (EN-DE) translation. In CH-EN translation, we use LDC corpus... In EN-JA translation, we use KFTT dataset4, which includes 0.44M sentence pairs... In EN-DE translation, we use WMT 2014 EN-DE dataset... 4http://www.phontron.com/kftt/.
Dataset Splits Yes In CH-EN translation, we use LDC corpus which includes 2.1M sentence pairs for training. NIST 2003 dataset is used for validation. NIST04-06 and 08 datasets are used for testing. In EN-JA translation, we use KFTT dataset4, which includes 0.44M sentence pairs for training, 1166 sentence pairs for validation and 1160 sentence pairs for testing. In EN-DE translation, we use WMT 2014 EN-DE dataset, which includes 4.5M sentence pairs for training. 2012-2013 datasets are used for validation and 2014 dataset is used for testing.
Hardware Specification No The paper does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies No The paper mentions using specific tools and models like 'fast-align tool', 'BPE method', 'LSTM layers', 'Transformer', and refers to GitHub repositories for baseline implementations (e.g., 'https://github.com/isi-nlp/Zoph RNN.', 'https://github.com/tensorflow/tensor2tensor.'), but it does not specify version numbers for any of these software dependencies.
Experiment Setup Yes The word embedding dimension and the size of hidden layers are both set to 1,000. The minibatch size is set to 128 (Zoph and Knight 2016)5. ... In all methods, the entropy threshold e0 = 4. In the pre-training method, we first pretrain the model 10 epochs with the pseudo sentences. In the multitask method, the balance weight λ in Eq. (5) is set to 0.35. In the two-pass method, the balance weight λ in Eq. (9) is set to 0.3. All these hyper-parameters are fine-tuned on the validation set.