Rethinking the Flat Minima Searching in Federated Learning

Authors: Taehwan Lee, Sung Whan Yoon

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we empirically and theoretically analyze the relationship between heterogeneity across clients and flatness discrepancy: A strong heterogeneity leads to severe discrepancy, eventually yielding the degraded performance of the global model. Based on the findings, we propose a method called Federated Learning for Global Flatness (Fed GF) that relieves flatness discrepancy, leading to flatter minima of the global model. We empirically confirm that our method shows remarkable performance gains over prior flatness searching FL methods, ranging up to +5.09% and +10.02% gains in the heterogeneous CIFAR-10 and CIFAR-100 benchmarks, respectively.
Researcher Affiliation Academia 1Graduate School of Artificial Intelligence, Ulsan National Institute of Science and Technology (UNIST), Ulsan, South Korea 2Department of Electrical Engineering, UNIST, Ulsan, South Korea.
Pseudocode Yes A pseudocode of Fed GF is provided in Appendix C. Algorithm 1 presents the algorithmic pseudocode of the training procedures of Fed GF.
Open Source Code Yes Codes are available at github.com/hwan-sig/Official-Fed GF
Open Datasets Yes We consider two datasets that are widely used in federated learning: CIFAR-10 and CIFAR-100 (Krizhevsky et al.).
Dataset Splits No The paper specifies training and test sets but does not explicitly mention a separate validation set or details on how hyperparameter tuning was performed with a validation set.
Hardware Specification Yes We implemented the models based on the PyTorch framework (Paszke et al., 2019) and ran the experiments with NVIDIA A5000 and A6000 processors.
Software Dependencies No The paper mentions "PyTorch framework (Paszke et al., 2019)" but does not provide specific version numbers for PyTorch or any other software libraries used, such as the public implementation by (Zeng et al., 2023).
Experiment Setup Yes We distributed 500 data samples per client, and the number of local updates per round is 8, with batch size 64. As done in (Hsu et al., 2020), the prior distribution of local data follows the Dirichlet distribution of α, i.e., α {0, 0.005, 10} for both CIFAR-10 and CIFAR-100 experiments. We set the local learning rate as ηl = 0.01, global learning rate as ηg = 1, weight decay as 0.0004, and batch size as 64.