On the Importance and Applicability of Pre-Training for Federated Learning

Authors: Hong-You Chen, Cheng-Hao Tu, Ziwei Li, Han Wei Shen, Wei-Lun Chao

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Across multiple visual recognition benchmarks, we found that pre-training can not only improve FL, but also close its accuracy gap to the counterpart centralized learning, especially in the challenging cases of non-IID clients data. 4.2 EXPERIMENTAL SETUP AND IMPLEMENTATION DETAILS
Researcher Affiliation Academia Hong-You Chen, Cheng-Hao Tu, Ziwei Li, Han-Wei Shen, Wei-Lun Chao Department of Computer Science and Engineering, The Ohio State University
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes Our Py Torch code implementation is provided at https://github.com/andytu28/FPS_ Pre-training.
Open Datasets Yes We conduct the study using five visual recognition datasets: CIFAR-10, CIFAR-100 (Krizhevsky et al., 2009), Tiny Image Net (Le & Yang, 2015), i Naturalist (Van Horn et al., 2018), and Cityscapes (Cordts et al., 2015).
Dataset Splits Yes To simulate non-IID conditions across clients, we follow (Hsu et al., 2019) to partition the training set of CIFAR-10, CIFAR-100, and Tiny-Image Net into M clients. We reserve 2% data of the training set as the validation set for hyperparameter tuning (e.g., for the learning rate3).
Hardware Specification Yes For experiments with 32 32 images, we trained on a 2080 Ti GPU. For experiments with 224 224 images, we trained on an A6000 GPU. For the Cityscape dataset, we trained with an A6000 GPU for about 2 days.
Software Dependencies No The paper mentions software like Py Torch, SGD optimizer, Adam optimizer, and Distil BERT, but does not provide specific version numbers for these software components.
Experiment Setup Yes We perform FEDAVG for 100 iterative rounds. Each round of local training takes 5 epochs2. We use the SGD optimizer with weight decay 1e 4 and a 0.9 momentum, except that on Deep Lab V3+ we use an Adam (Kingma & Ba, 2015) optimizer. We follow the literature (He et al., 2016b) to decay the learning rate by 0.1 every 30 rounds. Table 9 provides specific learning rates and batch sizes.