reproducibilityindex.ai

Federated Nearest Neighbor Machine Translation

Authors: Yichao Du, Zhirui Zhang, Bingzhe Wu, Lemao Liu, Tong Xu, Enhong Chen

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments show that Fed NN significantly reduces computational and communication costs compared with Fed Avg, while maintaining promising translation performance in different FL settings.
Researcher Affiliation	Collaboration	University of Science and Technology of China State Key Laboratory of Cognitive Intelligence Tencent AI Lab duyichao@mail.ustc.edu.cn {tongxu, cheneh}@ustc.edu.cn zrustc11@gmail.com {bingzhewu, redmondliu}@tencent.com
Pseudocode	No	The paper includes a workflow diagram (Figure 1) but does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Our code is open-sourced on https://github.com/duyichao/Fed NN-MT.
Open Datasets	Yes	We adopt WMT14 En-De data (Bojar et al., 2014) and multi-domain En-De dataset (Koehn & Knowles, 2017) to simulate two typical FL scenarios for model evaluation: 1) the non-independently identically distribution (Non-IID setting) where each client distributes data from different domains; 2) the independently identically distribution (IID setting) where each client contains the same data distribution from all domains.
Dataset Splits	Yes	Table 3: The statistics of datasets for server and clients. Server WMT14 ... Dev 45,206 ... Client IT ... Dev 2,000
Hardware Specification	Yes	We train all models with 4 Tesla-V100 GPU and set patience to 5 to select the best checkpoint on the validation set.
Software Dependencies	No	The paper mentions software like FAIRSEQ, Adam optimizer, FAISS, Moses toolkit, and sacre BLEU, but it does not specify version numbers for these components, which is required for reproducibility.
Experiment Setup	Yes	The input embedding size of the transformer layer is 512, the FFN layer dimension is 2048, and the number of self-attention heads is 8. During training, we deploy the Adam optimizer (Kingma & Ba, 2015) with a learning rate of 5e-4 and 4K warm-up updates to optimize model parameters. Both label smoothing coefﬁcient and dropout rate are set to 0.1. The batch size is set to 16K tokens.