Efficient Personalized Federated Learning via Sparse Model-Adaptation
Authors: Daoyuan Chen, Liuyi Yao, Dawei Gao, Bolin Ding, Yaliang Li
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the proposed p Fed Gate on four FL benchmarks compared to several SOTA methods. We show that p Fed Gate achieves superior global accuracy, individual accuracy and efficiency simultaneously (up to 4.53% average accuracy improvement with 12x smaller sparsity than the compared strongest PFL method). We demonstrate the effectiveness and robustness of p Fed Gate in the partial clients participation and novel clients participation scenarios. We find that p Fed Gate can learn meaningful sparse local models adapted to different data distributions, and conduct extensive experiments to study the effect of sparsity and verify the necessity and effectiveness of p Fed Gate components. |
| Researcher Affiliation | Industry | 1Alibaba Group. Correspondence to: Yaliang Li <yaliang.li@alibaba-inc.com>. |
| Pseudocode | Yes | We summarize the overall algorithm in Algorithm 1. Besides, we present more details about (1) the gradients flow via the gating layer, which contains a knapsack solver; and (2) the global model aggregation. |
| Open Source Code | Yes | Our codes are at https://github.com/alibaba/FederatedScope/tree/master/benchmark/pFL-Bench. |
| Open Datasets | Yes | We adopt four widely used FL datasets in our experiments: EMNIST (Cohen et al., 2017), FEMNIST (Caldas et al., 2018), CIFAR10 and CIFAR100 (Krizhevsky, 2009). |
| Dataset Splits | Yes | All datasets are randomly split into train/valid/test sets with a ratio 6:2:2. |
| Hardware Specification | Yes | We implement all models with Py Torch, and run experiments on Tesla V100 and NVIDIA Ge Force GTX 1080 Ti GPUs. |
| Software Dependencies | No | The paper mentions 'We implement all models with Py Torch' but does not specify version numbers for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | For each method on each dataset, we use the SGD optimizer and grid search the learning rate ηg from [0.005, 0.01, 0.03, 0.05, 0.1, 0.3, 0.5], set the communication round T = 400, the batch size as 128 and the local update step as 1 epoch. For p Fed Gate, the learning rate of gating layer η is searched from [0.01, 0.05, 0.1, 0.3, 0.5, 1, 1.5], and we set the block size splitting factor B = 5 for all evaluated models. |