Simple and Scalable Nearest Neighbor Machine Translation
Authors: Yuhan Dai, Zhirui Zhang, Qiuzhi Liu, Qu Cui, Weihua Li, Yichao Du, Tong Xu
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on machine translation in two general settings, static domain adaptation, and online learning, demonstrate that our proposed approach not only achieves almost 90% speed as the NMT model without performance degradation, but also significantly reduces the storage requirements of k NN-MT. |
| Researcher Affiliation | Collaboration | Yuhan Dai , Zhirui Zhang , Qiuzhi Liu , Qu Cui , Weihua Li , Yichao Du and Tong Xu University of Science and Technology of China Tencent AI Lab |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is open-sourced on https://github.com/dirkiedai/sk-mt. |
| Open Datasets | Yes | For the domain adaptation task, we use the same multi-domain dataset as the baseline (Khandelwal et al., 2021) and consider IT, Medical, Koran, and Law in our experiments. The statistics of the dataset are shown in Appendix A.1. |
| Dataset Splits | Yes | Table 6: The statistics of multi-domain dataset. Koran IT Medical Law Train Sents 18k 223k 248k 467k Dev Sents 2000 2000 2000 2000 Test Sents 2000 2000 2000 2000 |
| Hardware Specification | Yes | The hardware we use is 112 cores of Intel(R) Xeon(R) Gold 6258R CPU and a single Ge Force RTX 2080 Ti GPU. |
| Software Dependencies | No | The paper mentions software like THUMT, FAISS, Elastic Search, and Moses toolkit, but does not provide specific version numbers for these or other ancillary software components. |
| Experiment Setup | Yes | During inference, we carefully tune the hyper-parameters on the development set by performing a grid search on k {1, 2, 3, 4}, m {1, 2, 4, 8, 16} and τ {5, 10, 20, 50, 100, 150, 200}. Based on the validation results, we select two widely-used model architectures in our experiments, m = 2, k = 1 as SKMT1 and m = 16, k = 2 as SK-MT2, where the temperature τ-s are both set to 100. The beam size and length penalty are set to 4 and 0.6 for all datasets. |