DirectQE: Direct Pretraining for Machine Translation Quality Estimation
Authors: Qu Cui, Shujian Huang, Jiahuan Li, Xiang Geng, Zaixiang Zheng, Guoping Huang, Jiajun Chen12719-12727
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on widely used benchmarks show that Direct QE outperforms existing methods, without using any pretraining models such as BERT. We also give extensive analyses showing how fixing the two gaps contributes to our improvements. |
| Researcher Affiliation | Collaboration | Qu Cui1, Shujian Huang1, Jiahuan Li1, Xiang Geng1, Zaixiang Zheng1, Guoping Huang2, Jiajun Chen1 1National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China 2Tencent AI Lab, Shenzhen, China |
| Pseudocode | No | The paper describes the proposed methods in narrative text and does not include any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any specific statements or links regarding the availability of open-source code for the described methodology. |
| Open Datasets | Yes | We carry out experiments on the WMT19 and WMT17 QE tasks for English-to-German (EN-DE) direction. ... These datasets are all officially released for the WMT QE Shared Task. |
| Dataset Splits | Yes | We carry out experiments on the WMT19 and WMT17 QE tasks... These datasets are all officially released for the WMT QE Shared Task. ... We randomly cut 2,000 sentence pairs out of the parallel data and use the generator to produce pseudo QE data with the Sample strategy. This dataset is used as the development set to monitor the pretraining process, and the performance on this dataset will be used for model selection. |
| Hardware Specification | No | The paper describes model architectures and training details but does not provide any specific hardware specifications (e.g., GPU/CPU models, memory, or cloud instances) used for the experiments. |
| Software Dependencies | No | The paper mentions several models and libraries like Transformer, BERT, Bi-LSTM, and Hugging Face, but it does not specify software dependencies with version numbers (e.g., Python, PyTorch, or specific library versions). |
| Experiment Setup | Yes | Implementation Details. For NMT-based QE, The predictor of NMT-based QE consists of an encoder and a decoder, each of them is a 6-layer Transformer with hidden states dimension 512. ... For Direct QE, the detector is based on the transformer ..., with one encoder and one decoder, also the same size as the NMT-based QE. The generator is based on a transformer of 6 layers but with hidden states dimension 256 for each layer ... We mask tokens with a 15% mask ratio. ... We use BPE (Sennrich, Haddow, and Birch 2015) in our experiments and set BPE steps to 30,000. |