Federated Dynamic Sparse Training: Computing Less, Communicating Less, Yet Learning Better

Authors: Sameer Bibikar, Haris Vikalo, Zhangyang Wang, Xiaohan Chen6080-6088

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we develop, implement, and experimentally validate a novel FL framework termed Federated Dynamic Sparse Training (Fed DST) by which complex neural networks can be deployed and trained with substantially improved efficiency in both ondevice computation and in-network communication. [...] Fed DST consistently outperforms competing algorithms in our experiments: for instance, at any fixed upload data cap on non-iid CIFAR-10, it gains an impressive accuracy advantage of 10% over Fed Avg M when given the same upload data cap; the accuracy gap remains 3% even when Fed Avg M is given 2 the upload data cap, further demonstrating efficacy of Fed DST.
Researcher Affiliation Academia Sameer Bibikar, Haris Vikalo, Zhangyang Wang, Xiaohan Chen* Department of Electrical and Computer Engineering, The University of Texas at Austin {bibikar,hvikalo,atlaswang,xiaohan.chen}@utexas.edu
Pseudocode Yes Algorithm 1: Overview of the proposed Federated Dynamic Sparse Training (Fed DST).
Open Source Code Yes Code is available at: https://github.com/bibikar/feddst.
Open Datasets Yes We use MNIST (Le Cun, Cortes, and Burges 2010) and CIFAR-10 (Krizhevsky 2009) datasets distributed among clients in a pathologically non-iid setting, similar to (Mc Mahan et al. 2017) and matching the datasets used in (Li et al. 2020). We assume a total pool of 400 clients. Each client is assigned 2 classes and given 20 training images from each class. To distribute CIFAR-100 (Krizhevsky 2009) in a non-iid fashion, we use a Dirichlet(0.1) distribution for each class to distribute its samples among 400 clients, as in (Wang et al. 2020; Li, He, and Song 2021; Wang et al. 2021).
Dataset Splits No The paper describes how training data is distributed to clients but does not explicitly provide details about a separate validation set split or its use.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU models, CPU models, memory) used for running the experiments.
Software Dependencies No The paper does not list specific software dependencies with their version numbers.
Experiment Setup Yes We use MNIST (Le Cun, Cortes, and Burges 2010) and CIFAR-10 (Krizhevsky 2009) datasets distributed among clients in a pathologically non-iid setting, similar to (Mc Mahan et al. 2017) and matching the datasets used in (Li et al. 2020). We assume a total pool of 400 clients. Each client is assigned 2 classes and given 20 training images from each class. [...] We fix S = 0.8 for sparse methods, α = 0.05 for DST methods, and µ = 1 for the proximal penalty. [...] 20 clients/round, 10 epochs/round, = 0.01