On the Importance and Applicability of Pre-Training for Federated Learning
Authors: Hong-You Chen, Cheng-Hao Tu, Ziwei Li, Han Wei Shen, Wei-Lun Chao
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Across multiple visual recognition benchmarks, we found that pre-training can not only improve FL, but also close its accuracy gap to the counterpart centralized learning, especially in the challenging cases of non-IID clients data. 4.2 EXPERIMENTAL SETUP AND IMPLEMENTATION DETAILS |
| Researcher Affiliation | Academia | Hong-You Chen, Cheng-Hao Tu, Ziwei Li, Han-Wei Shen, Wei-Lun Chao Department of Computer Science and Engineering, The Ohio State University |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our Py Torch code implementation is provided at https://github.com/andytu28/FPS_ Pre-training. |
| Open Datasets | Yes | We conduct the study using five visual recognition datasets: CIFAR-10, CIFAR-100 (Krizhevsky et al., 2009), Tiny Image Net (Le & Yang, 2015), i Naturalist (Van Horn et al., 2018), and Cityscapes (Cordts et al., 2015). |
| Dataset Splits | Yes | To simulate non-IID conditions across clients, we follow (Hsu et al., 2019) to partition the training set of CIFAR-10, CIFAR-100, and Tiny-Image Net into M clients. We reserve 2% data of the training set as the validation set for hyperparameter tuning (e.g., for the learning rate3). |
| Hardware Specification | Yes | For experiments with 32 32 images, we trained on a 2080 Ti GPU. For experiments with 224 224 images, we trained on an A6000 GPU. For the Cityscape dataset, we trained with an A6000 GPU for about 2 days. |
| Software Dependencies | No | The paper mentions software like Py Torch, SGD optimizer, Adam optimizer, and Distil BERT, but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | We perform FEDAVG for 100 iterative rounds. Each round of local training takes 5 epochs2. We use the SGD optimizer with weight decay 1e 4 and a 0.9 momentum, except that on Deep Lab V3+ we use an Adam (Kingma & Ba, 2015) optimizer. We follow the literature (He et al., 2016b) to decay the learning rate by 0.1 every 30 rounds. Table 9 provides specific learning rates and batch sizes. |