Recycling Model Updates in Federated Learning: Are Gradient Subspaces Low-Rank?

Authors: Sheikh Shams Azam, Seyyedali Hosseinalipour, Qiang Qiu, Christopher Brinton

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our subsequent experimental results demonstrate the improvement LBGM obtains in communication overhead compared to conventional federated learning on several datasets and deep learning models. ... 4 EXPERIMENTS ... Figure 5: LBGM as a Standalone Algorithm. ... Figure 6: Effect of δthreshold k on LBGM. ... Figure 7: LBGM as a Plug-and-Play Algorithm. ... Figure 8: Application of LBGM as a plug-and-play algorithm on top of Sign SGD in distributed training.
Researcher Affiliation Academia Sheikh Shams Azam, Seyyedali Hosseinalipour, Qiang Qiu, Christopher Brinton School of ECE, Purdue University {azam1, hosseina, qqiu, cgb}@purdue.edu
Pseudocode Yes Algorithm 1 LBGM: Look-back Gradient Multiplier... Algorithm 2 Pseudocode for Preliminary Experiments in Section 2
Open Source Code Yes All of our code and hyperparameters are available at https://github.com/shams-sam/Fed Optim.
Open Datasets Yes We start with 4 different NN architectures: (i) fully-connected neural network (FCN), (ii) convolutional neural network (CNN), (iii) Res Net18 (He et al., 2016), and (iv) VGG19 (Simonyan & Zisserman, 2014); trained on 2 datasets: CIFAR-10 (Krizhevsky et al., 2009) and Celeb A (Liu et al., 2015), with classification and regression tasks, respectively. ... In Appendix E.1, we further find that (H1) holds in our experiments using several additional datasets: CIFAR-100 (Krizhevsky et al., 2009), MNIST (Le Cun & Cortes, 2010), FMNIST (Xiao et al., 2017), Celeb A (Liu et al., 2015), Pascal VOC (Everingham et al., 2010), COCO (Lin et al., 2014); models: U-Net (Ronneberger et al., 2015), SVM (Cortes & Vapnik, 1995); and tasks: segmentation and regression.
Dataset Splits No The paper uses standard datasets and mentions training and testing, but it does not explicitly provide the specific percentages or counts for training, validation, and test splits used in their experiments. For example, it doesn't state "80% for training, 10% for validation, 10% for test" or similar.
Hardware Specification Yes The FL system is simulated using Py Torch (Paszke et al., 2019) and Py Syft (Ryffel et al., 2018) and trained on a 48GB Tesla-P100 GPU with 128GB RAM.
Software Dependencies No The paper mentions software like PyTorch (Paszke et al., 2019) and PySyft (Ryffel et al., 2018), but it does not provide specific version numbers for these software dependencies.
Experiment Setup Yes We consider an FL system consisting of 100 workers. We consider both the iid and non-iid data distributions among the workers. ... The workers train with mini-batch sizes ranging from 128 to 512 based on the choice of dataset. We implement LBGM with uniform δthreshold k across workers. We also use error feedback (Karimireddy et al., 2019) as standard only if top K sparsification is used in the training. ... All of our code and hyperparameters are available at https://github.com/shams-sam/Fed Optim.