Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
“Bilingual Expert” Can Find Translation Errors
Authors: Kai Fan, Jiayi Wang, Bo Li, Fengming Zhou, Boxing Chen, Luo Si6367-6374
AAAI 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experimental results show that our approach achieves the state-of-the-art performance in most public available datasets of WMT 2017/2018 QE task. |
| Researcher Affiliation | Industry | Kai Fan, Jiayi Wang, Bo Li, Fengming Zhou, Boxing Chen, Luo Si Alibaba Group Inc. k.fan,joanne.wjy,shiji.lb,zfm104435,boxing.cbx,EMAIL |
| Pseudocode | Yes | Algorithm 1 Translation Quality Estimation with Bi Transformer and Bi-LSTM |
| Open Source Code | No | The paper does not provide a link to its source code or explicitly state that the code for their methodology is open-source or publicly available. |
| Open Datasets | Yes | The data resources that we used for training the neural Bilingual Expert model are mainly from WMT1: (i) parallel corpora released for the WMT17/18 News Machine Translation Task, (ii) UFAL Medical Corpus and Khresmoi development data released for the WMT17/18 Biomedical Translation Task, (iii) src-pe pairs for the WMT17/18 QE Task. 1http://www.statmt.org/wmt18/ |
| Dataset Splits | Yes | We evaluate our algorithm on the testing data of WMT 2017/2018, and development data of CWMT 2018. For fair comparison, we tuned all the hyper-parameters of our model on the development data, and reported the corresponding results for the testing data. |
| Hardware Specification | Yes | The bilingual expert model is trained on 8 Nvidia P-100 GPUs for about 3 days until convergence. For translation QE model, we use only one layer Bi-LSTM, and it is trained on a single GPU. |
| Software Dependencies | No | The paper mentions software like 'scikit-learn' and 'CRFSuite toolkit' but does not specify their version numbers or other required software dependencies with versions. |
| Experiment Setup | Yes | The number of layers in the bidirectional transformer for each module is 2, and the number of hidden units for feedforward sub-layer is 512. We use the 8-head self-attention in practice, since the single one is just a weighted average of previous layers. For translation QE model, we use only one layer Bi-LSTM... |