Simple and Scalable Nearest Neighbor Machine Translation

Authors: Yuhan Dai, Zhirui Zhang, Qiuzhi Liu, Qu Cui, Weihua Li, Yichao Du, Tong Xu

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on machine translation in two general settings, static domain adaptation, and online learning, demonstrate that our proposed approach not only achieves almost 90% speed as the NMT model without performance degradation, but also significantly reduces the storage requirements of k NN-MT.
Researcher Affiliation Collaboration Yuhan Dai , Zhirui Zhang , Qiuzhi Liu , Qu Cui , Weihua Li , Yichao Du and Tong Xu University of Science and Technology of China Tencent AI Lab
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Our code is open-sourced on https://github.com/dirkiedai/sk-mt.
Open Datasets Yes For the domain adaptation task, we use the same multi-domain dataset as the baseline (Khandelwal et al., 2021) and consider IT, Medical, Koran, and Law in our experiments. The statistics of the dataset are shown in Appendix A.1.
Dataset Splits Yes Table 6: The statistics of multi-domain dataset. Koran IT Medical Law Train Sents 18k 223k 248k 467k Dev Sents 2000 2000 2000 2000 Test Sents 2000 2000 2000 2000
Hardware Specification Yes The hardware we use is 112 cores of Intel(R) Xeon(R) Gold 6258R CPU and a single Ge Force RTX 2080 Ti GPU.
Software Dependencies No The paper mentions software like THUMT, FAISS, Elastic Search, and Moses toolkit, but does not provide specific version numbers for these or other ancillary software components.
Experiment Setup Yes During inference, we carefully tune the hyper-parameters on the development set by performing a grid search on k {1, 2, 3, 4}, m {1, 2, 4, 8, 16} and τ {5, 10, 20, 50, 100, 150, 200}. Based on the validation results, we select two widely-used model architectures in our experiments, m = 2, k = 1 as SKMT1 and m = 16, k = 2 as SK-MT2, where the temperature τ-s are both set to 100. The beam size and length penalty are set to 4 and 0.6 for all datasets.