Why Go Full? Elevating Federated Learning Through Partial Network Updates

Authors: Haolin Wang, Xuefeng Liu, Jianwei Niu, Wenkai Guo, Shaojie Tang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our theoretical analysis and experimental results show that the Fed Part method significantly outperforms traditional full-network update strategies, achieving faster convergence, greater accuracy, and reduced communication and computational overhead. Experimentally, we perform extensive evaluations on various datasets and model architectures. The results indicate that the Fed Part method significantly improves convergence speed and final performance (e.g., an improvement of 24.8% on Tiny-Image Net with Res Net-18), while also reducing both communication overhead (by 85%) and computational overhead (by 27%) simultaneously. Furthermore, our ablation experiments demonstrate how each of the proposed strategies contributes to enhancing the overall performance of Fed Part. We also conduct comprehensive visualization experiments to illustrate the underlying rationale of Fed Part.
Researcher Affiliation Academia State Key Laboratory of Virtual Reality Technology and Systems, School of Computer Science and Engineering, Beihang University, Beijing, China Center for AI Business Innovation, Department of Management Science and Systems, University at Buffalo, Buffalo, New York, USA. Zhongguancun Laboratory, Beijing, China {wanghaolin, liu_xuefeng, niujianwei, kyeguo}@buaa.edu.cn shaojiet@buffalo.edu
Pseudocode No The paper does not contain a figure, block, or section labeled 'Pseudocode' or 'Algorithm'.
Open Source Code Yes The source code is available at: https://github.com/FLAIR-Community/Fling
Open Datasets Yes We conduct experiments on the CIFAR-10 [Krizhevsky et al., 2010], CIFAR-100 [Krizhevsky et al., 2009], and Tiny Image Net [Le and Yang, 2015] datasets. We also extend the Fed Part method to the field of natural language processing and evaluate it on AGnews and Sogou News [Zhang et al., 2015] datasets.
Dataset Splits No The paper mentions 'training datasets' and testing the 'global model on a balanced set', implying train and test sets, but does not explicitly provide details about a 'validation' dataset split or specific percentages for any splits.
Hardware Specification Yes All experiments are conducted on a server equipped with 8 A100 GPUs
Software Dependencies No The paper mentions using the Adam optimizer but does not specify version numbers for programming languages, machine learning frameworks, or other software libraries used in the experiments.
Experiment Setup Yes In the experimental setup, we primarily choose 40 clients, with local epochs to be 8. We utilize the Adam optimizer [Kingma and Ba, 2014] with a learning rate of 0.001, which is determined to be the optimal learning rate. In line with prior references [Li et al., 2021b, Chen et al., 2022], we refrain from uploading local statistical information during model aggregation. Each experiment is conducted three times with different random seeds to ensure robustness.