reproducibilityindex.ai

Rethinking the Flat Minima Searching in Federated Learning

Authors: Taehwan Lee, Sung Whan Yoon

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we empirically and theoretically analyze the relationship between heterogeneity across clients and flatness discrepancy: A strong heterogeneity leads to severe discrepancy, eventually yielding the degraded performance of the global model. Based on the findings, we propose a method called Federated Learning for Global Flatness (Fed GF) that relieves flatness discrepancy, leading to flatter minima of the global model. We empirically confirm that our method shows remarkable performance gains over prior flatness searching FL methods, ranging up to +5.09% and +10.02% gains in the heterogeneous CIFAR-10 and CIFAR-100 benchmarks, respectively.
Researcher Affiliation	Academia	1Graduate School of Artificial Intelligence, Ulsan National Institute of Science and Technology (UNIST), Ulsan, South Korea 2Department of Electrical Engineering, UNIST, Ulsan, South Korea.
Pseudocode	Yes	A pseudocode of Fed GF is provided in Appendix C. Algorithm 1 presents the algorithmic pseudocode of the training procedures of Fed GF.
Open Source Code	Yes	Codes are available at github.com/hwan-sig/Official-Fed GF
Open Datasets	Yes	We consider two datasets that are widely used in federated learning: CIFAR-10 and CIFAR-100 (Krizhevsky et al.).
Dataset Splits	No	The paper specifies training and test sets but does not explicitly mention a separate validation set or details on how hyperparameter tuning was performed with a validation set.
Hardware Specification	Yes	We implemented the models based on the PyTorch framework (Paszke et al., 2019) and ran the experiments with NVIDIA A5000 and A6000 processors.
Software Dependencies	No	The paper mentions "PyTorch framework (Paszke et al., 2019)" but does not provide specific version numbers for PyTorch or any other software libraries used, such as the public implementation by (Zeng et al., 2023).
Experiment Setup	Yes	We distributed 500 data samples per client, and the number of local updates per round is 8, with batch size 64. As done in (Hsu et al., 2020), the prior distribution of local data follows the Dirichlet distribution of α, i.e., α {0, 0.005, 10} for both CIFAR-10 and CIFAR-100 experiments. We set the local learning rate as ηl = 0.01, global learning rate as ηg = 1, weight decay as 0.0004, and batch size as 64.