FedSpeed: Larger Local Interval, Less Communication Round, and Higher Generalization Accuracy

Authors: Yan Sun, Li Shen, Tiansheng Huang, Liang Ding, Dacheng Tao

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Moreover, we conduct extensive experiments on the real-world dataset to demonstrate the efficiency of our proposed Fed Speed, which performs significantly faster and achieves the state-of-the-art (SOTA) performance on the general FL experimental settings than several baselines including Fed Avg, Fed Prox, Fed CM, Fed Adam, SCAFFOLD, Fed Dyn, Fed ADMM, etc.
Researcher Affiliation Collaboration Yan Sun The University of Sydney ysun9899@uni.sydney.edu.au Li Shen JD Explore Academy mathshenli@gmail.com Tiansheng Huang Georgia Institute of Technology tianshenghuangscut@gmail.com Liang Ding JD Explore Academy liangding.liam@gmail.com Dacheng Tao JD Explore Academy & The University of Sydney dacheng.tao@gmail.com
Pseudocode Yes Algorithm 1 Fed Speed Algorithm Framework
Open Source Code No The paper does not provide any concrete access information (e.g., specific link, explicit statement of code release, or mention of code in supplementary materials) for the source code of the methodology described.
Open Datasets Yes We test the experiments on CIFAR-10, CIFAR-100 Krizhevsky et al. (2009) and Tiny Imagenet dataset. Due to the space limitations we introduce these datasets in the Appendix.
Dataset Splits No The paper specifies training and test data sizes (e.g., 'CIFAR-10 dataset contains 50,000 training data and 10,000 test data'), but does not explicitly provide details for a separate validation split or a predefined train/validation/test partitioning.
Hardware Specification Yes We test the time on the A100-SXM4-40GB GPU and show the performance in the Table B.3.3.
Software Dependencies No The paper does not provide specific version numbers for software dependencies used in the experiments (e.g., 'Python 3.8, PyTorch 1.9, and CUDA 11.1').
Experiment Setup Yes Implementation details. We select each hyper-parameters within the appropriate range and present the combinations under the best performance. To fairly compare these baseline methods, we fix the most hyper-parameters for all methods under the same setting. For the 10% participation of total 100 clients training, we set the local learning rate as 0.1 initially and set the global learning rate as 1.0 for all methods except for Fed Adam which applies 0.1 on global server. The learning rate decay is set as multiplying 0.998 per communication round except for Fed Dyn, Fed ADMM and Fed Speed which apply 0.9995. Each active local client trains 5 epochs with batchsize 50. Weight decay is set as 1e-3 for all methods. The weight for the prox-term in Fed Prox, Fed Dyn, Fed ADMM and Fed Speed is set as 0.1.