reproducibilityindex.ai

Simple and Scalable Nearest Neighbor Machine Translation

Authors: Yuhan Dai, Zhirui Zhang, Qiuzhi Liu, Qu Cui, Weihua Li, Yichao Du, Tong Xu

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on machine translation in two general settings, static domain adaptation, and online learning, demonstrate that our proposed approach not only achieves almost 90% speed as the NMT model without performance degradation, but also significantly reduces the storage requirements of k NN-MT.
Researcher Affiliation	Collaboration	Yuhan Dai , Zhirui Zhang , Qiuzhi Liu , Qu Cui , Weihua Li , Yichao Du and Tong Xu University of Science and Technology of China Tencent AI Lab
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	Our code is open-sourced on https://github.com/dirkiedai/sk-mt.
Open Datasets	Yes	For the domain adaptation task, we use the same multi-domain dataset as the baseline (Khandelwal et al., 2021) and consider IT, Medical, Koran, and Law in our experiments. The statistics of the dataset are shown in Appendix A.1.
Dataset Splits	Yes	Table 6: The statistics of multi-domain dataset. Koran IT Medical Law Train Sents 18k 223k 248k 467k Dev Sents 2000 2000 2000 2000 Test Sents 2000 2000 2000 2000
Hardware Specification	Yes	The hardware we use is 112 cores of Intel(R) Xeon(R) Gold 6258R CPU and a single Ge Force RTX 2080 Ti GPU.
Software Dependencies	No	The paper mentions software like THUMT, FAISS, Elastic Search, and Moses toolkit, but does not provide specific version numbers for these or other ancillary software components.
Experiment Setup	Yes	During inference, we carefully tune the hyper-parameters on the development set by performing a grid search on k {1, 2, 3, 4}, m {1, 2, 4, 8, 16} and τ {5, 10, 20, 50, 100, 150, 200}. Based on the validation results, we select two widely-used model architectures in our experiments, m = 2, k = 1 as SKMT1 and m = 16, k = 2 as SK-MT2, where the temperature τ-s are both set to 100. The beam size and length penalty are set to 4 and 0.6 for all datasets.