Acceleration of Federated Learning with Alleviated Forgetting in Local Training

Authors: Chencheng Xu, Zhiwei Hong, Minlie Huang, Tao Jiang

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our comprehensive experiments demonstrate that Fed Reg not only significantly improves the convergence rate of FL, especially when the neural network architecture is deep and the clients data are extremely non-i.i.d., but is also able to protect privacy better in classification problems and more robust against gradient inversion attacks.
Researcher Affiliation Academia 1BNRIST, Tsinghua University, Beijing 100084, China 2Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China 3Department of Computer Science and Engineering, UCR, CA 92521, USA {xucc18, hzw17}@mails.tsinghua.edu.cn aihuang@tsinghua.edu.cn, jiang@cs.ucr.edu
Pseudocode Yes The pseudo code of Fed Reg is provided in Appendix D.
Open Source Code Yes The code is available at: https://github.com/Zoesgithub/Fed Reg.
Open Datasets Yes Fed Reg is evaluated on MNIST (Deng, 2012), EMNIST (Cohen et al., 2017), CIFAR-10 and CIFAR100 (Krizhevsky et al., 2009), and CT images of COVID-19 (He, 2020).
Dataset Splits Yes To simulate a scenario for FL, the training data in each dataset are split into multiple clients in different ways and the performance of the trained model is evaluated on the test data. The data preparation steps for each dataset are described below and more experimental details are provided in Appendix B. [...] The learning rates for different methods are optimized by grid search, the optimal weight for the proximal term in Fed Prox is searched among {1.0, 0.1, 0.01, 0.001}, the optimal λ for Fed Curv is searched among {0.001, 0.0001, 0.00001}, and the hyper-parameters (γ and ηs) in Fed Reg are optimized by grid search as well.
Hardware Specification Yes The GPUs are 1080 Ti.
Software Dependencies No The paper mentions using specific optimizers (Adam) and normalization techniques (group-normalization) but does not provide specific version numbers for software dependencies such as programming languages, libraries (e.g., PyTorch, TensorFlow), or CUDA versions.
Experiment Setup Yes In all the experiments, the learning rates for different methods are optimized by grid search, the optimal weight for the proximal term in Fed Prox is searched among {1.0, 0.1, 0.01, 0.001}, the optimal λ for Fed Curv is searched among {0.001, 0.0001, 0.00001}, and the hyper-parameters (γ and ηs) in Fed Reg are optimized by grid search as well. The number of epochs in the local training stage is optimized in Fed Avg and applied to the other methods.