Federated Dynamic Sparse Training: Computing Less, Communicating Less, Yet Learning Better
Authors: Sameer Bibikar, Haris Vikalo, Zhangyang Wang, Xiaohan Chen6080-6088
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we develop, implement, and experimentally validate a novel FL framework termed Federated Dynamic Sparse Training (Fed DST) by which complex neural networks can be deployed and trained with substantially improved efficiency in both ondevice computation and in-network communication. [...] Fed DST consistently outperforms competing algorithms in our experiments: for instance, at any fixed upload data cap on non-iid CIFAR-10, it gains an impressive accuracy advantage of 10% over Fed Avg M when given the same upload data cap; the accuracy gap remains 3% even when Fed Avg M is given 2 the upload data cap, further demonstrating efficacy of Fed DST. |
| Researcher Affiliation | Academia | Sameer Bibikar, Haris Vikalo, Zhangyang Wang, Xiaohan Chen* Department of Electrical and Computer Engineering, The University of Texas at Austin {bibikar,hvikalo,atlaswang,xiaohan.chen}@utexas.edu |
| Pseudocode | Yes | Algorithm 1: Overview of the proposed Federated Dynamic Sparse Training (Fed DST). |
| Open Source Code | Yes | Code is available at: https://github.com/bibikar/feddst. |
| Open Datasets | Yes | We use MNIST (Le Cun, Cortes, and Burges 2010) and CIFAR-10 (Krizhevsky 2009) datasets distributed among clients in a pathologically non-iid setting, similar to (Mc Mahan et al. 2017) and matching the datasets used in (Li et al. 2020). We assume a total pool of 400 clients. Each client is assigned 2 classes and given 20 training images from each class. To distribute CIFAR-100 (Krizhevsky 2009) in a non-iid fashion, we use a Dirichlet(0.1) distribution for each class to distribute its samples among 400 clients, as in (Wang et al. 2020; Li, He, and Song 2021; Wang et al. 2021). |
| Dataset Splits | No | The paper describes how training data is distributed to clients but does not explicitly provide details about a separate validation set split or its use. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU models, CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper does not list specific software dependencies with their version numbers. |
| Experiment Setup | Yes | We use MNIST (Le Cun, Cortes, and Burges 2010) and CIFAR-10 (Krizhevsky 2009) datasets distributed among clients in a pathologically non-iid setting, similar to (Mc Mahan et al. 2017) and matching the datasets used in (Li et al. 2020). We assume a total pool of 400 clients. Each client is assigned 2 classes and given 20 training images from each class. [...] We fix S = 0.8 for sparse methods, α = 0.05 for DST methods, and µ = 1 for the proximal penalty. [...] 20 clients/round, 10 epochs/round, = 0.01 |