Out-of-Distribution Generalization of Federated Learning via Implicit Invariant Relationships
Authors: Yaming Guo, Kai Guo, Xiaofeng Cao, Tieru Wu, Yi Chang
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that FEDIIR significantly outperforms relevant baselines in terms of out-of-distribution generalization of federated learning. We validate the effectiveness of the proposed method using two scenarios: a small number of clients and a large number of clients (limited communication). |
| Researcher Affiliation | Academia | 1School of Artificial Intelligence, Jilin University, Changchun, China. Correspondence to: Xiaofeng Cao <xiaofengcao@jlu.edu.cn>, Tieru Wu <wutr@jlu.edu.cn>. |
| Pseudocode | Yes | A standard algorithm for solving (ERM) is FEDAVG(Mc Mahan et al., 2017), whose pseudo-code is presented in Algorithm 1. |
| Open Source Code | Yes | Our code will be released at https://github.com/Yaming Guo98/Fed IIR. |
| Open Datasets | Yes | We conduct extensive experiments on four widely used datasets, including Rotated MNIST(Ghifary et al., 2015), VLCS(Fang et al., 2013), PACS(Li et al., 2017) and Office Home(Venkateswara et al., 2017). |
| Dataset Splits | Yes | Per common practice, we allocate 90% of the available data for training and 10% for validation. |
| Hardware Specification | No | The paper does not specify any particular hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using "stochastic gradient descent (SGD)" but does not specify any software libraries, frameworks, or their version numbers. |
| Experiment Setup | Yes | For each dataset, we only tune hyperparameters via grid search in the scenario with a small number of clients and do not modify them for a larger number of client scenarios (see Appendix F.3). In all experiments, we train the global model using the global step-size ηg = 1 for 100 communication rounds, where the local model on the client is trained with stochastic gradient descent (SGD) for one epoch. Table 3 in Appendix F.3 specifies: local step-size ηl (e.g., 1e-2), batch size (e.g., 64), regularization strength γ (e.g., 1e-2), number of rounds T (100), global step-size ηg (1), ema υ (0.95), and seed (0, 1, 2). |