reproducibilityindex.ai

On the Importance and Applicability of Pre-Training for Federated Learning

Authors: Hong-You Chen, Cheng-Hao Tu, Ziwei Li, Han Wei Shen, Wei-Lun Chao

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Across multiple visual recognition benchmarks, we found that pre-training can not only improve FL, but also close its accuracy gap to the counterpart centralized learning, especially in the challenging cases of non-IID clients data. 4.2 EXPERIMENTAL SETUP AND IMPLEMENTATION DETAILS
Researcher Affiliation	Academia	Hong-You Chen, Cheng-Hao Tu, Ziwei Li, Han-Wei Shen, Wei-Lun Chao Department of Computer Science and Engineering, The Ohio State University
Pseudocode	No	The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Our Py Torch code implementation is provided at https://github.com/andytu28/FPS_ Pre-training.
Open Datasets	Yes	We conduct the study using five visual recognition datasets: CIFAR-10, CIFAR-100 (Krizhevsky et al., 2009), Tiny Image Net (Le & Yang, 2015), i Naturalist (Van Horn et al., 2018), and Cityscapes (Cordts et al., 2015).
Dataset Splits	Yes	To simulate non-IID conditions across clients, we follow (Hsu et al., 2019) to partition the training set of CIFAR-10, CIFAR-100, and Tiny-Image Net into M clients. We reserve 2% data of the training set as the validation set for hyperparameter tuning (e.g., for the learning rate3).
Hardware Specification	Yes	For experiments with 32 32 images, we trained on a 2080 Ti GPU. For experiments with 224 224 images, we trained on an A6000 GPU. For the Cityscape dataset, we trained with an A6000 GPU for about 2 days.
Software Dependencies	No	The paper mentions software like Py Torch, SGD optimizer, Adam optimizer, and Distil BERT, but does not provide specific version numbers for these software components.
Experiment Setup	Yes	We perform FEDAVG for 100 iterative rounds. Each round of local training takes 5 epochs2. We use the SGD optimizer with weight decay 1e 4 and a 0.9 momentum, except that on Deep Lab V3+ we use an Adam (Kingma & Ba, 2015) optimizer. We follow the literature (He et al., 2016b) to decay the learning rate by 0.1 every 30 rounds. Table 9 provides specific learning rates and batch sizes.