Internal Cross-layer Gradients for Extending Homogeneity to Heterogeneity in Federated Learning

Authors: Yun-Hin Chan, Rui Zhou, Running Zhao, Zhihan JIANG, Edith C. H. Ngai

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we conduct comprehensive experiments aimed at demonstrating three fundamental aspects: (1) the efficacy of In Co Aggregation and its extensions for various FL methods (Section 5.2), (2) the robustness analysis and ablation study of In Co Aggregation (Section 5.3), (3) in-depth analyses of the underlying principles behind In Co Aggregation (Section 5.4). Our codes are released on Git Hub 3. More experimental details and results can be found in Appendix H.
Researcher Affiliation Academia Yun-Hin Chan, Rui Zhou, Running Zhao, Zhihan Jiang & Edith C.H. Ngai Department of Electrical and Electronic Engineering, The University of Hong Kong {chanyunhin,zackery,rnzhao,zhjiang}@connect.hku.hk, chngai@eee.hku.hk
Pseudocode Yes Algorithm 1 In Co Aggregation (In Co Avg as the example) Require: Dataset Dk, k {1, ..., K}, K clients, and their weights w1, ..., w K. Ensure: Weights for all clients w1, ..., w K. 1: Server process: 2: while not converge do 3: Receives gt wi from the sampled client. 4: Parameter aggregation for gt wi. 5: for each layer lk in the server model do 6: if lk needs cross-layer gradients then 7: gt lk , gt l0 Normalizes gt lk and gt l0. 8: θt, α, β from Theorem 3.1. 9: gt+1 lk = (gt lk θtgt l0 ) (||gt lk ||+||gt l0||) 2 . 10: else 11: gt+1 lk = gt lk 12: end if 13: wt+1 lk = wt lk + gt+1 lk 14: end for 15: Sends the updated wt+1 i to sampled clients. 16: end while 17: Client processes: 18: while random clients i, i 1, ..., K do 19: Receives model weights wt 1 i . 20: Updates client models wt 1 i to wt i. 21: Sends gt wi = wt i wt 1 i to the server. 22: end while
Open Source Code Yes Our codes are released on Git Hub 3. 3https://github.com/Chan Yun Hin/In Co-Aggregation
Open Datasets Yes We conduct experiments on Fashion-MNIST (Xiao et al., 2017), SVHN (Netzer et al., 2011), CIFAR-10 (Krizhevsky et al., 2009) and CINIC-10 (Darlow et al., 2018) under non-iid settings.
Dataset Splits No The paper mentions conducting experiments on various datasets and evaluating algorithms, but it does not explicitly provide details about a validation dataset split (e.g., percentages or counts for training, validation, and test sets).
Hardware Specification Yes We conduct our experiments with 4 NVIDIA Ge Force RTX 3090s.
Software Dependencies No The paper mentions software like 'Py Torch Image Models (timm)' and 'Adam optimizer' but does not specify version numbers for these software components, which is required for reproducibility.
Experiment Setup Yes We use Adam optimizer with a learning rate of 0.001, β1 = 0.9 and β2 = 0.999, default parameter settings for all methods of Res Nets. The local training epochs are fixed to 5. The batch size is 64 for all experiments. Furthermore, the global communication rounds are 500 for Res Nets, and 200 for Vi Ts for all datasets. Global communication rounds for MOON and In Co MOON are 100 to prevent the extreme overfitting in Fashion-MNIST. The hyper-parameter µ for Fed Prox and In Co Prox is 0.05 for Vi Ts and Res Nets.