Why Go Full? Elevating Federated Learning Through Partial Network Updates
Authors: Haolin Wang, Xuefeng Liu, Jianwei Niu, Wenkai Guo, Shaojie Tang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our theoretical analysis and experimental results show that the Fed Part method significantly outperforms traditional full-network update strategies, achieving faster convergence, greater accuracy, and reduced communication and computational overhead. Experimentally, we perform extensive evaluations on various datasets and model architectures. The results indicate that the Fed Part method significantly improves convergence speed and final performance (e.g., an improvement of 24.8% on Tiny-Image Net with Res Net-18), while also reducing both communication overhead (by 85%) and computational overhead (by 27%) simultaneously. Furthermore, our ablation experiments demonstrate how each of the proposed strategies contributes to enhancing the overall performance of Fed Part. We also conduct comprehensive visualization experiments to illustrate the underlying rationale of Fed Part. |
| Researcher Affiliation | Academia | State Key Laboratory of Virtual Reality Technology and Systems, School of Computer Science and Engineering, Beihang University, Beijing, China Center for AI Business Innovation, Department of Management Science and Systems, University at Buffalo, Buffalo, New York, USA. Zhongguancun Laboratory, Beijing, China {wanghaolin, liu_xuefeng, niujianwei, kyeguo}@buaa.edu.cn shaojiet@buffalo.edu |
| Pseudocode | No | The paper does not contain a figure, block, or section labeled 'Pseudocode' or 'Algorithm'. |
| Open Source Code | Yes | The source code is available at: https://github.com/FLAIR-Community/Fling |
| Open Datasets | Yes | We conduct experiments on the CIFAR-10 [Krizhevsky et al., 2010], CIFAR-100 [Krizhevsky et al., 2009], and Tiny Image Net [Le and Yang, 2015] datasets. We also extend the Fed Part method to the field of natural language processing and evaluate it on AGnews and Sogou News [Zhang et al., 2015] datasets. |
| Dataset Splits | No | The paper mentions 'training datasets' and testing the 'global model on a balanced set', implying train and test sets, but does not explicitly provide details about a 'validation' dataset split or specific percentages for any splits. |
| Hardware Specification | Yes | All experiments are conducted on a server equipped with 8 A100 GPUs |
| Software Dependencies | No | The paper mentions using the Adam optimizer but does not specify version numbers for programming languages, machine learning frameworks, or other software libraries used in the experiments. |
| Experiment Setup | Yes | In the experimental setup, we primarily choose 40 clients, with local epochs to be 8. We utilize the Adam optimizer [Kingma and Ba, 2014] with a learning rate of 0.001, which is determined to be the optimal learning rate. In line with prior references [Li et al., 2021b, Chen et al., 2022], we refrain from uploading local statistical information during model aggregation. Each experiment is conducted three times with different random seeds to ensure robustness. |