reproducibilityindex.ai

SWIFT: Rapid Decentralized Federated Learning via Wait-Free Model Communication

Authors: Marco Bornstein, Tahseen Rabbani, Evan Z Wang, Amrit Bedi, Furong Huang

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experimental results demonstrate that SWIFT s run-time is reduced due to a large reduction in communication time per epoch, which falls by an order of magnitude compared to synchronous counterparts. Furthermore, SWIFT produces loss levels for image classification, over IID and non-IID data settings, upwards of 50% faster than existing SOTA algorithms. Below, we perform image classification experiments for a range of decentralized FL algorithms (Krizhevsky et al., 2009).
Researcher Affiliation	Academia	Marco Bornstein, Tahseen Rabbani, Evan Wang, Amrit Singh Bedi, & Furong Huang Department of Computer Science, University of Maryland {marcob, trabbani, ezw, amritbd, furongh}@umd.edu
Pseudocode	Yes	A diagram and algorithmic block of SWIFT are depicted in Figure 1 and Algorithm 1 respectively. Algorithm 1: Shared Wa It-Free Transmission (SWIFT)
Open Source Code	Yes	Code for SWIFT can be found on Git Hub at https://github.com/umd-huang-lab/SWIFT.
Open Datasets	Yes	As stated in Section 6, we perform image classification experiments on the CIFAR-10 dataset (Krizhevsky et al., 2009).
Dataset Splits	No	The paper discusses data partitioning for training among clients and mentions a 'test' set, but it does not provide specific details or percentages for a separate 'validation' split.
Hardware Specification	Yes	Each node has an NVIDIA GeForce RTX 2080 Ti GPU.
Software Dependencies	No	The paper mentions 'Python', 'Open MPI', 'MPI4Py', and 'Pytorch' but does not provide specific version numbers for any of these software dependencies.
Experiment Setup	Yes	In Table 6, one can see that the step-size decay column is split into the following sections: rate, E, and frequency. The rate is the decay rate for the step-size. For example, in the Baseline row, the step-size decays by 1/10. The term E is the epoch at which decay begins during training. For example, in the Baseline row, the step-size decays at E = 81 and 122. Frequency simply is how often the step-size decays. Table 6: Hyperparameters for all experiments.