Federated Nearest Neighbor Machine Translation
Authors: Yichao Du, Zhirui Zhang, Bingzhe Wu, Lemao Liu, Tong Xu, Enhong Chen
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that Fed NN significantly reduces computational and communication costs compared with Fed Avg, while maintaining promising translation performance in different FL settings. |
| Researcher Affiliation | Collaboration | University of Science and Technology of China State Key Laboratory of Cognitive Intelligence Tencent AI Lab duyichao@mail.ustc.edu.cn {tongxu, cheneh}@ustc.edu.cn zrustc11@gmail.com {bingzhewu, redmondliu}@tencent.com |
| Pseudocode | No | The paper includes a workflow diagram (Figure 1) but does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is open-sourced on https://github.com/duyichao/Fed NN-MT. |
| Open Datasets | Yes | We adopt WMT14 En-De data (Bojar et al., 2014) and multi-domain En-De dataset (Koehn & Knowles, 2017) to simulate two typical FL scenarios for model evaluation: 1) the non-independently identically distribution (Non-IID setting) where each client distributes data from different domains; 2) the independently identically distribution (IID setting) where each client contains the same data distribution from all domains. |
| Dataset Splits | Yes | Table 3: The statistics of datasets for server and clients. Server WMT14 ... Dev 45,206 ... Client IT ... Dev 2,000 |
| Hardware Specification | Yes | We train all models with 4 Tesla-V100 GPU and set patience to 5 to select the best checkpoint on the validation set. |
| Software Dependencies | No | The paper mentions software like FAIRSEQ, Adam optimizer, FAISS, Moses toolkit, and sacre BLEU, but it does not specify version numbers for these components, which is required for reproducibility. |
| Experiment Setup | Yes | The input embedding size of the transformer layer is 512, the FFN layer dimension is 2048, and the number of self-attention heads is 8. During training, we deploy the Adam optimizer (Kingma & Ba, 2015) with a learning rate of 5e-4 and 4K warm-up updates to optimize model parameters. Both label smoothing coefficient and dropout rate are set to 0.1. The batch size is set to 16K tokens. |