FedChain: Chained Algorithms for Near-optimal Communication Cost in Federated Learning
Authors: Charlie Hou, Kiran Koshy Thekumparampil, Giulia Fanti, Sewoong Oh
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results support this theoretical gain over existing methods. We evaluate the utility of our framework on strongly convex and nonconvex settings. We compare the communication round complexities of four baselines: Fed Avg, SGD, ASG, and SCAFFOLD, the stepsize decaying variants of these algorithms (which are prefixed by Min plots) and the various instantiations of Fed Chain (Algo. 1). Convex Optimization (Logistic Regression) We first study federated regularized logistic regression, which is strongly convex (objective function in App. I.1). In this experiment, we use the MNIST dataset of handwritten digits (Le Cun et al., 2010). |
| Researcher Affiliation | Academia | Charlie Hou Department of Electrical and Computer Engineering Carnegie Mellon University Pittsburgh, Pennsylvania, USA charlieh@andrew.cmu.edu Kiran K. Thekumparampil Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign Champaign, Illinois, USA thekump2@illinois.edu Giulia Fanti Department of Electrical and Computer Engineering Carnegie Mellon University Pittsburgh, Pennsylvania, USA gfanti@andrew.cmu.edu Sewoong Oh Allen School of Computer Science and Engineering University of Washington Seattle, Washington, USA sewoong@cs.washington.edu |
| Pseudocode | Yes | Algorithm 1 Federated Chaining (Fed Chain) Algorithm 2 SGD Algorithm 3 ASG Algorithm 4 Fed Avg Algorithm 5 SAGA Algorithm 6 SSNM |
| Open Source Code | No | The paper does not provide an explicit statement or link for open-source code for the described methodology. |
| Open Datasets | Yes | We use the MNIST dataset of handwritten digits (Le Cun et al., 2010). We started with digit classification over EMNIST (Cohen et al., 2017)... We also considered image classification with Res Net-18 on CIFAR-100 (Krizhevsky, 2009). |
| Dataset Splits | No | The paper specifies train and test set sizes/clients but does not explicitly mention a validation set or details about dataset splits for training, validation, and testing beyond the given train/test counts. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU/CPU models, memory, or cloud instance types used for the experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., Python version, library versions like PyTorch or TensorFlow). |
| Experiment Setup | Yes | Hyperparameters. All experiments initialize iterates at 0 with regularization µ = 0.1. We fix the total number of rounds R (differs across experiments). For algorithms without stepsize decay, the tuning process is as follows. We tune all stepsizes η in the range below: {10 3, 10 2.5, 10 2, 10 1.5, 10 1}. We tune the percentage of rounds before switching from Alocal to Aglobal (if applicable) as {10 2, 10 1.625, 10 1.25, 10 0.875, 10 0.5}. We set R = 500, and rather than fixing the number of local steps, we fix the number of local client epochs (the number of times a client performs gradient descent over its own dataset) to 20. The number of clients sampled per round is 10. |