Acceleration of Federated Learning with Alleviated Forgetting in Local Training
Authors: Chencheng Xu, Zhiwei Hong, Minlie Huang, Tao Jiang
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our comprehensive experiments demonstrate that Fed Reg not only significantly improves the convergence rate of FL, especially when the neural network architecture is deep and the clients data are extremely non-i.i.d., but is also able to protect privacy better in classification problems and more robust against gradient inversion attacks. |
| Researcher Affiliation | Academia | 1BNRIST, Tsinghua University, Beijing 100084, China 2Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China 3Department of Computer Science and Engineering, UCR, CA 92521, USA {xucc18, hzw17}@mails.tsinghua.edu.cn aihuang@tsinghua.edu.cn, jiang@cs.ucr.edu |
| Pseudocode | Yes | The pseudo code of Fed Reg is provided in Appendix D. |
| Open Source Code | Yes | The code is available at: https://github.com/Zoesgithub/Fed Reg. |
| Open Datasets | Yes | Fed Reg is evaluated on MNIST (Deng, 2012), EMNIST (Cohen et al., 2017), CIFAR-10 and CIFAR100 (Krizhevsky et al., 2009), and CT images of COVID-19 (He, 2020). |
| Dataset Splits | Yes | To simulate a scenario for FL, the training data in each dataset are split into multiple clients in different ways and the performance of the trained model is evaluated on the test data. The data preparation steps for each dataset are described below and more experimental details are provided in Appendix B. [...] The learning rates for different methods are optimized by grid search, the optimal weight for the proximal term in Fed Prox is searched among {1.0, 0.1, 0.01, 0.001}, the optimal λ for Fed Curv is searched among {0.001, 0.0001, 0.00001}, and the hyper-parameters (γ and ηs) in Fed Reg are optimized by grid search as well. |
| Hardware Specification | Yes | The GPUs are 1080 Ti. |
| Software Dependencies | No | The paper mentions using specific optimizers (Adam) and normalization techniques (group-normalization) but does not provide specific version numbers for software dependencies such as programming languages, libraries (e.g., PyTorch, TensorFlow), or CUDA versions. |
| Experiment Setup | Yes | In all the experiments, the learning rates for different methods are optimized by grid search, the optimal weight for the proximal term in Fed Prox is searched among {1.0, 0.1, 0.01, 0.001}, the optimal λ for Fed Curv is searched among {0.001, 0.0001, 0.00001}, and the hyper-parameters (γ and ηs) in Fed Reg are optimized by grid search as well. The number of epochs in the local training stage is optimized in Fed Avg and applied to the other methods. |