Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Simple and Scalable Nearest Neighbor Machine Translation
Authors: Yuhan Dai, Zhirui Zhang, Qiuzhi Liu, Qu Cui, Weihua Li, Yichao Du, Tong Xu
ICLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on machine translation in two general settings, static domain adaptation, and online learning, demonstrate that our proposed approach not only achieves almost 90% speed as the NMT model without performance degradation, but also significantly reduces the storage requirements of k NN-MT. |
| Researcher Affiliation | Collaboration | Yuhan Dai , Zhirui Zhang , Qiuzhi Liu , Qu Cui , Weihua Li , Yichao Du and Tong Xu University of Science and Technology of China Tencent AI Lab |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is open-sourced on https://github.com/dirkiedai/sk-mt. |
| Open Datasets | Yes | For the domain adaptation task, we use the same multi-domain dataset as the baseline (Khandelwal et al., 2021) and consider IT, Medical, Koran, and Law in our experiments. The statistics of the dataset are shown in Appendix A.1. |
| Dataset Splits | Yes | Table 6: The statistics of multi-domain dataset. Koran IT Medical Law Train Sents 18k 223k 248k 467k Dev Sents 2000 2000 2000 2000 Test Sents 2000 2000 2000 2000 |
| Hardware Specification | Yes | The hardware we use is 112 cores of Intel(R) Xeon(R) Gold 6258R CPU and a single Ge Force RTX 2080 Ti GPU. |
| Software Dependencies | No | The paper mentions software like THUMT, FAISS, Elastic Search, and Moses toolkit, but does not provide specific version numbers for these or other ancillary software components. |
| Experiment Setup | Yes | During inference, we carefully tune the hyper-parameters on the development set by performing a grid search on k {1, 2, 3, 4}, m {1, 2, 4, 8, 16} and τ {5, 10, 20, 50, 100, 150, 200}. Based on the validation results, we select two widely-used model architectures in our experiments, m = 2, k = 1 as SKMT1 and m = 16, k = 2 as SK-MT2, where the temperature τ-s are both set to 100. The beam size and length penalty are set to 4 and 0.6 for all datasets. |