Internal Cross-layer Gradients for Extending Homogeneity to Heterogeneity in Federated Learning
Authors: Yun-Hin Chan, Rui Zhou, Running Zhao, Zhihan JIANG, Edith C. H. Ngai
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we conduct comprehensive experiments aimed at demonstrating three fundamental aspects: (1) the efficacy of In Co Aggregation and its extensions for various FL methods (Section 5.2), (2) the robustness analysis and ablation study of In Co Aggregation (Section 5.3), (3) in-depth analyses of the underlying principles behind In Co Aggregation (Section 5.4). Our codes are released on Git Hub 3. More experimental details and results can be found in Appendix H. |
| Researcher Affiliation | Academia | Yun-Hin Chan, Rui Zhou, Running Zhao, Zhihan Jiang & Edith C.H. Ngai Department of Electrical and Electronic Engineering, The University of Hong Kong {chanyunhin,zackery,rnzhao,zhjiang}@connect.hku.hk, chngai@eee.hku.hk |
| Pseudocode | Yes | Algorithm 1 In Co Aggregation (In Co Avg as the example) Require: Dataset Dk, k {1, ..., K}, K clients, and their weights w1, ..., w K. Ensure: Weights for all clients w1, ..., w K. 1: Server process: 2: while not converge do 3: Receives gt wi from the sampled client. 4: Parameter aggregation for gt wi. 5: for each layer lk in the server model do 6: if lk needs cross-layer gradients then 7: gt lk , gt l0 Normalizes gt lk and gt l0. 8: θt, α, β from Theorem 3.1. 9: gt+1 lk = (gt lk θtgt l0 ) (||gt lk ||+||gt l0||) 2 . 10: else 11: gt+1 lk = gt lk 12: end if 13: wt+1 lk = wt lk + gt+1 lk 14: end for 15: Sends the updated wt+1 i to sampled clients. 16: end while 17: Client processes: 18: while random clients i, i 1, ..., K do 19: Receives model weights wt 1 i . 20: Updates client models wt 1 i to wt i. 21: Sends gt wi = wt i wt 1 i to the server. 22: end while |
| Open Source Code | Yes | Our codes are released on Git Hub 3. 3https://github.com/Chan Yun Hin/In Co-Aggregation |
| Open Datasets | Yes | We conduct experiments on Fashion-MNIST (Xiao et al., 2017), SVHN (Netzer et al., 2011), CIFAR-10 (Krizhevsky et al., 2009) and CINIC-10 (Darlow et al., 2018) under non-iid settings. |
| Dataset Splits | No | The paper mentions conducting experiments on various datasets and evaluating algorithms, but it does not explicitly provide details about a validation dataset split (e.g., percentages or counts for training, validation, and test sets). |
| Hardware Specification | Yes | We conduct our experiments with 4 NVIDIA Ge Force RTX 3090s. |
| Software Dependencies | No | The paper mentions software like 'Py Torch Image Models (timm)' and 'Adam optimizer' but does not specify version numbers for these software components, which is required for reproducibility. |
| Experiment Setup | Yes | We use Adam optimizer with a learning rate of 0.001, β1 = 0.9 and β2 = 0.999, default parameter settings for all methods of Res Nets. The local training epochs are fixed to 5. The batch size is 64 for all experiments. Furthermore, the global communication rounds are 500 for Res Nets, and 200 for Vi Ts for all datasets. Global communication rounds for MOON and In Co MOON are 100 to prevent the extreme overfitting in Fashion-MNIST. The hyper-parameter µ for Fed Prox and In Co Prox is 0.05 for Vi Ts and Res Nets. |