DirectQE: Direct Pretraining for Machine Translation Quality Estimation

Authors: Qu Cui, Shujian Huang, Jiahuan Li, Xiang Geng, Zaixiang Zheng, Guoping Huang, Jiajun Chen12719-12727

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on widely used benchmarks show that Direct QE outperforms existing methods, without using any pretraining models such as BERT. We also give extensive analyses showing how fixing the two gaps contributes to our improvements.
Researcher Affiliation Collaboration Qu Cui1, Shujian Huang1, Jiahuan Li1, Xiang Geng1, Zaixiang Zheng1, Guoping Huang2, Jiajun Chen1 1National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China 2Tencent AI Lab, Shenzhen, China
Pseudocode No The paper describes the proposed methods in narrative text and does not include any pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any specific statements or links regarding the availability of open-source code for the described methodology.
Open Datasets Yes We carry out experiments on the WMT19 and WMT17 QE tasks for English-to-German (EN-DE) direction. ... These datasets are all officially released for the WMT QE Shared Task.
Dataset Splits Yes We carry out experiments on the WMT19 and WMT17 QE tasks... These datasets are all officially released for the WMT QE Shared Task. ... We randomly cut 2,000 sentence pairs out of the parallel data and use the generator to produce pseudo QE data with the Sample strategy. This dataset is used as the development set to monitor the pretraining process, and the performance on this dataset will be used for model selection.
Hardware Specification No The paper describes model architectures and training details but does not provide any specific hardware specifications (e.g., GPU/CPU models, memory, or cloud instances) used for the experiments.
Software Dependencies No The paper mentions several models and libraries like Transformer, BERT, Bi-LSTM, and Hugging Face, but it does not specify software dependencies with version numbers (e.g., Python, PyTorch, or specific library versions).
Experiment Setup Yes Implementation Details. For NMT-based QE, The predictor of NMT-based QE consists of an encoder and a decoder, each of them is a 6-layer Transformer with hidden states dimension 512. ... For Direct QE, the detector is based on the transformer ..., with one encoder and one decoder, also the same size as the NMT-based QE. The generator is based on a transformer of 6 layers but with hidden states dimension 256 for each layer ... We mask tokens with a 15% mask ratio. ... We use BPE (Sennrich, Haddow, and Birch 2015) in our experiments and set BPE steps to 30,000.