Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Fusion of Global and Local Knowledge for Personalized Federated Learning

Authors: Tiansheng Huang, Li Shen, Yan Sun, Weiwei Lin, Dacheng Tao

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we conduct extensive experiment to validate the performance of Fed SLR. Specifically, our experiments on CIFAR-100 verify that: (i) the Global Knowledge Representation (GKR) can better represent the global knowledge (GKR model achieves 7.23% higher accuracy compared to Fed Avg), and the mixed models can significantly improve the model accuracy of personalized tasks (the mixed model produced by Fed SLR achieves 3.52% higher accuracy to Ditto). (ii) Moreover, both GKR and the mixed models, which respectively represent the global and personalized knowledge, are more compact (GKR model achieve 50.40% less parameters while the mixed model pruned out 40.11% parameters compared to a model without pruning). (iii) The downlink communication is lowered (38.34% fewer downlink communication in one session of FL). Theoretically, we establish the convergence property of GKR and the sparse personalized component, which showcases that both components asymptotically converge to their stationary points under proper settings.
Researcher Affiliation Collaboration Tiansheng Huang EMAIL Georgia Institute of Technology Li Shen EMAIL JD Explore Academy Yan Sun EMAIL The University of Sydney Weiwei Lin EMAIL South China University of Technology Peng Cheng Laboratory Dacheng Tao EMAIL JD Explore Academy
Pseudocode Yes Algorithm 1 Federated learning with mixed Sparse and Low-Rank representation (Fed SLR) Algorithm 2 Fed SLR under partial participation Algorithm 3 Memory-efficient Fed SLR
Open Source Code Yes Source code is available in https://github.com/huangtiansheng/fedslr.
Open Datasets Yes Datasets. We conduct simulation on CIFAR10/CIFAR100/Tiny Imagenet, with both IID and Non-IID data splitting, respectively.
Dataset Splits Yes We conduct simulation on CIFAR10/CIFAR100/Tiny Imagenet, with both IID and Non-IID data splitting, respectively. Specifically, for IID splitting, data is splitted uniformly to all the 100 clients. While for Non-IID, we use α-Dirichlet distribution to split the data to all the clients. Here α is set to 0.1 for all the Non-IID experiments. Datails of the setting are given in Appendix B.1. Appendix B.1: There are totally M = 100 clients in the simulation. We split the training data to these 100 clients under IID and Non-IID setting. For the IID setting, data are uniformly sampled for each client. For the Non-IID setting, we use α-Dirichlet distribution on the label ratios to ensure uneven label distributions among devices as (Hsu et al., 2019). The lower the distribution parameter α is, the more uneven the label distribution will be, and would be more challenging for FL. After the initial splitting of training data, we sample 100 pieces of testing data from the testing set to each client, with the same label ratio of their training data. Testing is performed on each client s own testing data and the overall testing accuracy (that we refer to Top-1 Acc in our experiment) is calculated as the average of all the client s testing accuracy. For all the baselines, we consistently use 0.1 participation ratio, i.e., 10 out of 100 clients are randomly selected in each round.
Hardware Specification Yes We measure the inference latency of the low-rank GKR on a Tesla M60 GPU.
Software Dependencies No The paper mentions using an SGD optimizer and setting learning rates, but does not specify version numbers for any software libraries (e.g., Python, PyTorch, TensorFlow, CUDA).
Experiment Setup Yes We use an SGD optimizer with weight decay parameter 1e-3 for the local solver. The learning rate is initialized as 0.1 and decayed with 0.998 after each communication round. We simulate 100 clients in total, and 10 of them are picked for local training for each round. For all the global methods (i.e., Fed SLR(GKR) 5, Fed Dyn, Fed Avg, SCAFFOLD), local epochs and batch size are fixed to 2 and 20. For Fed SLR and Ditto, the local epoch used in local fusion is 1, and also with batch size 20. For Fed SLR, the proximal stepsize is ηg = 10, the low-rank penalty is λ = 0.0001 and the sparse penalty is µ = 0.001 in our main experiment.