reproducibilityindex.ai

Federated Dynamic Sparse Training: Computing Less, Communicating Less, Yet Learning Better

Authors: Sameer Bibikar, Haris Vikalo, Zhangyang Wang, Xiaohan Chen6080-6088

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we develop, implement, and experimentally validate a novel FL framework termed Federated Dynamic Sparse Training (Fed DST) by which complex neural networks can be deployed and trained with substantially improved efﬁciency in both ondevice computation and in-network communication. [...] Fed DST consistently outperforms competing algorithms in our experiments: for instance, at any ﬁxed upload data cap on non-iid CIFAR-10, it gains an impressive accuracy advantage of 10% over Fed Avg M when given the same upload data cap; the accuracy gap remains 3% even when Fed Avg M is given 2 the upload data cap, further demonstrating efﬁcacy of Fed DST.
Researcher Affiliation	Academia	Sameer Bibikar, Haris Vikalo, Zhangyang Wang, Xiaohan Chen* Department of Electrical and Computer Engineering, The University of Texas at Austin {bibikar,hvikalo,atlaswang,xiaohan.chen}@utexas.edu
Pseudocode	Yes	Algorithm 1: Overview of the proposed Federated Dynamic Sparse Training (Fed DST).
Open Source Code	Yes	Code is available at: https://github.com/bibikar/feddst.
Open Datasets	Yes	We use MNIST (Le Cun, Cortes, and Burges 2010) and CIFAR-10 (Krizhevsky 2009) datasets distributed among clients in a pathologically non-iid setting, similar to (Mc Mahan et al. 2017) and matching the datasets used in (Li et al. 2020). We assume a total pool of 400 clients. Each client is assigned 2 classes and given 20 training images from each class. To distribute CIFAR-100 (Krizhevsky 2009) in a non-iid fashion, we use a Dirichlet(0.1) distribution for each class to distribute its samples among 400 clients, as in (Wang et al. 2020; Li, He, and Song 2021; Wang et al. 2021).
Dataset Splits	No	The paper describes how training data is distributed to clients but does not explicitly provide details about a separate validation set split or its use.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU models, CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper does not list specific software dependencies with their version numbers.
Experiment Setup	Yes	We use MNIST (Le Cun, Cortes, and Burges 2010) and CIFAR-10 (Krizhevsky 2009) datasets distributed among clients in a pathologically non-iid setting, similar to (Mc Mahan et al. 2017) and matching the datasets used in (Li et al. 2020). We assume a total pool of 400 clients. Each client is assigned 2 classes and given 20 training images from each class. [...] We ﬁx S = 0.8 for sparse methods, α = 0.05 for DST methods, and µ = 1 for the proximal penalty. [...] 20 clients/round, 10 epochs/round, = 0.01