SWIFT: Rapid Decentralized Federated Learning via Wait-Free Model Communication
Authors: Marco Bornstein, Tahseen Rabbani, Evan Z Wang, Amrit Bedi, Furong Huang
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results demonstrate that SWIFT s run-time is reduced due to a large reduction in communication time per epoch, which falls by an order of magnitude compared to synchronous counterparts. Furthermore, SWIFT produces loss levels for image classification, over IID and non-IID data settings, upwards of 50% faster than existing SOTA algorithms. Below, we perform image classification experiments for a range of decentralized FL algorithms (Krizhevsky et al., 2009). |
| Researcher Affiliation | Academia | Marco Bornstein, Tahseen Rabbani, Evan Wang, Amrit Singh Bedi, & Furong Huang Department of Computer Science, University of Maryland {marcob, trabbani, ezw, amritbd, furongh}@umd.edu |
| Pseudocode | Yes | A diagram and algorithmic block of SWIFT are depicted in Figure 1 and Algorithm 1 respectively. Algorithm 1: Shared Wa It-Free Transmission (SWIFT) |
| Open Source Code | Yes | Code for SWIFT can be found on Git Hub at https://github.com/umd-huang-lab/SWIFT. |
| Open Datasets | Yes | As stated in Section 6, we perform image classification experiments on the CIFAR-10 dataset (Krizhevsky et al., 2009). |
| Dataset Splits | No | The paper discusses data partitioning for training among clients and mentions a 'test' set, but it does not provide specific details or percentages for a separate 'validation' split. |
| Hardware Specification | Yes | Each node has an NVIDIA GeForce RTX 2080 Ti GPU. |
| Software Dependencies | No | The paper mentions 'Python', 'Open MPI', 'MPI4Py', and 'Pytorch' but does not provide specific version numbers for any of these software dependencies. |
| Experiment Setup | Yes | In Table 6, one can see that the step-size decay column is split into the following sections: rate, E, and frequency. The rate is the decay rate for the step-size. For example, in the Baseline row, the step-size decays by 1/10. The term E is the epoch at which decay begins during training. For example, in the Baseline row, the step-size decays at E = 81 and 122. Frequency simply is how often the step-size decays. Table 6: Hyperparameters for all experiments. |