FedFed: Feature Distillation against Data Heterogeneity in Federated Learning
Authors: Zhiqin Yang, Yonggang Zhang, Yu Zheng, Xinmei Tian, Hao Peng, Tongliang Liu, Bo Han
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comprehensive experiments demonstrate the efficacy of Fed Fed in promoting model performance. The code is publicly available at: https://github.com/tmlr-group/Fed Fed. We deploy Fed Fed on four popular FL algorithms, including Fed Avg [4], Fed Prox [6], SCAFFOLD [7], and Fed Nova [22]. Atop them, we conduct comprehensive experiments on various scenarios regarding different amounts of clients, varying degrees of heterogeneity, and four datasets. Extensive results show that the Fed Fed achieves considerable performance gains in all settings. Our contributions are summarized as follows: 3. We conduct comprehensive experiments to show that Fed Fed consistently and significantly enhances the convergence rate and generalization performance of FL models across different scenarios under various datasets (Sec 4.2). |
| Researcher Affiliation | Academia | Zhiqin Yang1,2 Yonggang Zhang2 Yu Zheng3 Xinmei Tian5 Hao Peng1,6 Tongliang Liu4 Bo Han2 1Beihang University 2Hong Kong Baptist University 3Chinese University of Hong Kong 4Sydney AI Centre, The University of Sydney 5University of Science and Technology of China 6 Kunming University of Science and Technology |
| Pseudocode | Yes | Algorithm 1 summarizes the procedure of feature distillation. Pseudo-code of how to apply Fed Fed are listed in Appendix B. Algorithm 2 Fed Avg/Fed Prox with Fed Fed. Algorithm 3 SCAFFOLD with Fed Fed. Algorithm 4 Fed Nova with Fed Fed. |
| Open Source Code | Yes | The code is publicly available at: https://github.com/tmlr-group/Fed Fed |
| Open Datasets | Yes | Following previous works [10, 29], we conduct experiments over CIFAR-10, CIFAR100 [30], Fashion-MNIST(FMNIST) [31], and SVHN [32]. Following [5], we employ latent Dirichlet sampling (LDA) [33] to simulate Non-IID distribution. |
| Dataset Splits | No | The paper mentions setting up Non-IID distributions with specific alpha values for datasets and notes batch sizes and local epochs. However, it does not provide explicit training/validation/test dataset splits (e.g., percentages or counts) or refer to standard predefined splits for reproducibility beyond the dataset names themselves. |
| Hardware Specification | Yes | Besides, all experiments are performed on Python 3.8, 36 core 3.00GHz Intel Core i9 CPU, and NVIDIA RTX A6000 GPUs. |
| Software Dependencies | No | The paper mentions "Python 3.8" but does not list specific versions for other key software components, libraries, or frameworks (e.g., PyTorch, TensorFlow) that would be essential for reproducibility. |
| Experiment Setup | Yes | We use Res Net-18 [35] both in the feature distillation and classifier in FL. Table 7: The values of all parameters in this paper. Federated Learning Relevant: α 0.1/0.05 heterogeneity degree, Td 15 communication round of feature distillation, Tr 1,000 communication round of classifier training, Ed 1 local epochs of feature distillation, E/Er 1/5 local epochs of classifier training, σ2 s 0.15 DP noise level, added to xs, |Ct|/|Cr| 5/10 #selected clients every communication round, K 10/100 #clients of federated system. Training Process Relevant: ηk 0.01/0.001/0.0001 learning rate, B 32/64 batch size, M 0.9 momentum, wd 0.0001 weight decay for regularization. |