Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Denoising Pre-training for Machine Translation Quality Estimation with Curriculum Learning

Authors: Xiang Geng, Yu Zhang, Jiahuan Li, Shujian Huang, Hao Yang, Shimin Tao, Yimeng Chen, Ning Xie, Jiajun Chen

AAAI 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on various benchmarks reveal that CLQE outperforms Direct QE and other strong baselines.
Researcher Affiliation Collaboration 1 National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China 2 Huawei Translation Services Center, Beijing, China EMAIL, EMAIL, EMAIL
Pseudocode Yes Algorithm 1: Denoising pre-training for machine translation quality estimation.
Open Source Code Yes We make our CLQE code available (https://github.com/ NJUNLP/njuqe). ... We provide our implementation online.5
Open Datasets Yes We employ WMT19 and WMT20/WMT214 QE dataset for English-German (EN-DE) and English-Chinese (EN-ZH) direction respectively. ... https://www.statmt.org/wmt##, ## can be 19, 20, 21.
Dataset Splits Yes The size of training, development, and test sets are 13K/1K/1K, 7K/1K/1K, and 8K/1K/1K for WMT19, 20, and 21 QE tasks, respectively. ...the pre-trained model is selected with the pseudo validation set for further fine-tuning.
Hardware Specification Yes All experiments are performed on NVIDIA V100 GPUs.
Software Dependencies No The paper mentions using Fairseq(-py) but does not provide a specific version number. Other mentioned software or models (e.g., XLM-R, GPT-2) do not include version details for their implementations.
Experiment Setup Yes We set the initial competence c0 = 0.05 and total duration of curriculum learning T = 5 epochs. Other details can be found in supplementary materials.