Achieving Linear Speedup with Partial Worker Participation in Non-IID Federated Learning

Authors: Haibo Yang, Minghong Fang, Jia Liu

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments on MNIST and CIFAR-10 to verify our theoretical results.
Researcher Affiliation Academia Haibo Yang, Minghong Fang, and Jia Liu Department of Electrical and Computer Engineering The Ohio State University Columbus, OH 43210 USA {yang.5952, fang.841, liu.1736}@osu.edu
Pseudocode Yes Algorithm 1 A Generalized Fed Avg Algorithm with Two-Sided Learning Rates.
Open Source Code No The paper does not provide an explicit link to open-source code for the described methodology.
Open Datasets Yes We conduct extensive experiments on MNIST and CIFAR-10 to verify our theoretical results. We use three models: logistic regression (LR), a fully-connected neural network with two hidden layers (2NN) and a convolution neural network (CNN) with the non-i.i.d. version of MNIST (Le Cun et al., 1998) and one Res Net model with CIFAR-10 (Krizhevsky et al., 2009).
Dataset Splits No The paper discusses training and testing samples but does not specify details for a validation split (e.g., percentages or counts for a validation set).
Hardware Specification Yes We run the experiments using the same GPU (NVIDIA V100) to ensure the same conditions.
Software Dependencies No The paper does not provide specific version numbers for ancillary software dependencies.
Experiment Setup Yes In this section, we elaborate the results under non-i.i.d. MNIST datasets for the 2NN. We distribute the MNIST dataset among m = 100 workers randomly and evenly in a digit-based manner such that the local dataset for each worker contains only a certain class of digits. The number of digits in each worker s dataset represents the non-i.i.d. degree. For digits_10, each worker has training/testing samples with ten digits from 0 to 9, which is essentially an i.i.d. case. For digits_1, each worker has samples only associated with one digit, which leads to highly non-i.i.d. datasets among workers. For partial worker participation, we set the number of workers n = 10 in each communication round.