SWIFT: Rapid Decentralized Federated Learning via Wait-Free Model Communication

Authors: Marco Bornstein, Tahseen Rabbani, Evan Z Wang, Amrit Bedi, Furong Huang

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental results demonstrate that SWIFT s run-time is reduced due to a large reduction in communication time per epoch, which falls by an order of magnitude compared to synchronous counterparts. Furthermore, SWIFT produces loss levels for image classification, over IID and non-IID data settings, upwards of 50% faster than existing SOTA algorithms. Below, we perform image classification experiments for a range of decentralized FL algorithms (Krizhevsky et al., 2009).
Researcher Affiliation Academia Marco Bornstein, Tahseen Rabbani, Evan Wang, Amrit Singh Bedi, & Furong Huang Department of Computer Science, University of Maryland {marcob, trabbani, ezw, amritbd, furongh}@umd.edu
Pseudocode Yes A diagram and algorithmic block of SWIFT are depicted in Figure 1 and Algorithm 1 respectively. Algorithm 1: Shared Wa It-Free Transmission (SWIFT)
Open Source Code Yes Code for SWIFT can be found on Git Hub at https://github.com/umd-huang-lab/SWIFT.
Open Datasets Yes As stated in Section 6, we perform image classification experiments on the CIFAR-10 dataset (Krizhevsky et al., 2009).
Dataset Splits No The paper discusses data partitioning for training among clients and mentions a 'test' set, but it does not provide specific details or percentages for a separate 'validation' split.
Hardware Specification Yes Each node has an NVIDIA GeForce RTX 2080 Ti GPU.
Software Dependencies No The paper mentions 'Python', 'Open MPI', 'MPI4Py', and 'Pytorch' but does not provide specific version numbers for any of these software dependencies.
Experiment Setup Yes In Table 6, one can see that the step-size decay column is split into the following sections: rate, E, and frequency. The rate is the decay rate for the step-size. For example, in the Baseline row, the step-size decays by 1/10. The term E is the epoch at which decay begins during training. For example, in the Baseline row, the step-size decays at E = 81 and 122. Frequency simply is how often the step-size decays. Table 6: Hyperparameters for all experiments.