What Do We Mean by Generalization in Federated Learning?
Authors: Honglin Yuan, Warren Richard Morningstar, Lin Ning, Karan Singhal
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments in six settings, including four image classification tasks: EMNIST-10 (digits only), EMNIST-62 (digits and characters) (Cohen et al., 2017; Caldas et al., 2019), CIFAR-10 and CIFAR-100 (Krizhevsky et al., 2009); and two next character/word prediction tasks: Shakespeare (Caldas et al., 2019) and Stack Overflow (Reddi et al., 2021). We use FEDAVGM for image classification tasks and FEDADAM for text-based tasks (Reddi et al., 2021).4 The detailed setups (including model, dataset preprocessing, hyperparameter tuning) are relegated to Appendix C. We summarize our main results in Figure 1 and Table 1. |
| Researcher Affiliation | Collaboration | Honglin Yuan Stanford University hongl.yuan@gmail.com Warren Morningstar, Lin Ning, Karan Singhal Google Research {wmorning, linning, karansinghal}@google.com |
| Pseudocode | No | The paper describes a two-stage procedure in Section 4.1 and Appendix D.1 (Algorithm D.1 is listed as a procedure, not a formally labeled pseudocode or algorithm block). |
| Open Source Code | Yes | We are also releasing an extensible code framework for studying generalization in FL (see Reproducibility Statement). We include all tasks reported in this work; the framework is easily extended with additional tasks. We also include libraries for performing label-based and semantic dataset partitioning (enabling new benchmark datasets for future works, see Appendix D). This framework enables easy reproduction of our results and facilitates future work. The framework is implemented using Tensor Flow Federated (Ingerman & Ostrowski, 2019). The code is released under Apache License 2.0. We hope that the release of this code encourages researchers to take up our suggestions presented in Section 6. Please visit https://bit.ly/fl-generalization for the code repository. |
| Open Datasets | Yes | The EMNIST dataset (Cohen et al., 2017) is a hand-written character recognition dataset derived from the NIST Special Database 19 (Grother & Flanagan, 1995). ... The CIFAR-10 and CIFAR-100 datasets (Krizhevsky et al., 2009)... The Shakespeare dataset (Caldas et al., 2019)... The Stack Overflow dataset... |
| Dataset Splits | Yes | To estimate these two risks in practice, we propose splitting datasets into three blocks. ... Given a dataset with client assignment, we first hold out a percentage of clients (e.g., 20%) as unparticipating clients... Within each participating client, we hold out a percentage of data (e.g., 20%) as participating validation data... For CIFAR-10 and CIFAR-100... we hold out 20% (60 for CIFAR-10, 20 for CIFAR-100) clients as unparticipating clients, and leave the remaining client as participating clients. Within each participating client, we hold out 20% of data as (participating) validation data. |
| Hardware Specification | No | The paper does not provide specific details on the hardware used for the experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper states, “The framework is implemented using Tensor Flow Federated (Ingerman & Ostrowski, 2019).” However, it does not specify a version number for TensorFlow Federated or any other key software libraries used. |
| Experiment Setup | Yes | The detailed setups (including model, dataset preprocessing, hyperparameter tuning) are relegated to Appendix C. For centralized training, we run 200 epochs of SGD with momentum = 0.9 with constant learning rate with batch size 50. The (centralized) learning rate is tuned from {10 2.5, 10 2, . . . , 10 0.5}. For federated training, we run 3000 rounds of FEDAVGM (Reddi et al., 2021) with server momentum = 0.9 and constant server and client learning rates. For each communication round, we uniformly sample 20 clients to train for 1 epoch with client batch size 20. The client and server learning rates are both tuned from {10 2, 10 1.5, . . . , 1}. |