FedChain: Chained Algorithms for Near-optimal Communication Cost in Federated Learning

Authors: Charlie Hou, Kiran Koshy Thekumparampil, Giulia Fanti, Sewoong Oh

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results support this theoretical gain over existing methods. We evaluate the utility of our framework on strongly convex and nonconvex settings. We compare the communication round complexities of four baselines: Fed Avg, SGD, ASG, and SCAFFOLD, the stepsize decaying variants of these algorithms (which are prefixed by Min plots) and the various instantiations of Fed Chain (Algo. 1). Convex Optimization (Logistic Regression) We first study federated regularized logistic regression, which is strongly convex (objective function in App. I.1). In this experiment, we use the MNIST dataset of handwritten digits (Le Cun et al., 2010).
Researcher Affiliation Academia Charlie Hou Department of Electrical and Computer Engineering Carnegie Mellon University Pittsburgh, Pennsylvania, USA charlieh@andrew.cmu.edu Kiran K. Thekumparampil Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign Champaign, Illinois, USA thekump2@illinois.edu Giulia Fanti Department of Electrical and Computer Engineering Carnegie Mellon University Pittsburgh, Pennsylvania, USA gfanti@andrew.cmu.edu Sewoong Oh Allen School of Computer Science and Engineering University of Washington Seattle, Washington, USA sewoong@cs.washington.edu
Pseudocode Yes Algorithm 1 Federated Chaining (Fed Chain) Algorithm 2 SGD Algorithm 3 ASG Algorithm 4 Fed Avg Algorithm 5 SAGA Algorithm 6 SSNM
Open Source Code No The paper does not provide an explicit statement or link for open-source code for the described methodology.
Open Datasets Yes We use the MNIST dataset of handwritten digits (Le Cun et al., 2010). We started with digit classification over EMNIST (Cohen et al., 2017)... We also considered image classification with Res Net-18 on CIFAR-100 (Krizhevsky, 2009).
Dataset Splits No The paper specifies train and test set sizes/clients but does not explicitly mention a validation set or details about dataset splits for training, validation, and testing beyond the given train/test counts.
Hardware Specification No The paper does not provide any specific hardware details such as GPU/CPU models, memory, or cloud instance types used for the experiments.
Software Dependencies No The paper does not specify any software dependencies with version numbers (e.g., Python version, library versions like PyTorch or TensorFlow).
Experiment Setup Yes Hyperparameters. All experiments initialize iterates at 0 with regularization µ = 0.1. We fix the total number of rounds R (differs across experiments). For algorithms without stepsize decay, the tuning process is as follows. We tune all stepsizes η in the range below: {10 3, 10 2.5, 10 2, 10 1.5, 10 1}. We tune the percentage of rounds before switching from Alocal to Aglobal (if applicable) as {10 2, 10 1.625, 10 1.25, 10 0.875, 10 0.5}. We set R = 500, and rather than fixing the number of local steps, we fix the number of local client epochs (the number of times a client performs gradient descent over its own dataset) to 20. The number of clients sampled per round is 10.