reproducibilityindex.ai

FedChain: Chained Algorithms for Near-optimal Communication Cost in Federated Learning

Authors: Charlie Hou, Kiran Koshy Thekumparampil, Giulia Fanti, Sewoong Oh

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results support this theoretical gain over existing methods. We evaluate the utility of our framework on strongly convex and nonconvex settings. We compare the communication round complexities of four baselines: Fed Avg, SGD, ASG, and SCAFFOLD, the stepsize decaying variants of these algorithms (which are preﬁxed by Min plots) and the various instantiations of Fed Chain (Algo. 1). Convex Optimization (Logistic Regression) We ﬁrst study federated regularized logistic regression, which is strongly convex (objective function in App. I.1). In this experiment, we use the MNIST dataset of handwritten digits (Le Cun et al., 2010).
Researcher Affiliation	Academia	Charlie Hou Department of Electrical and Computer Engineering Carnegie Mellon University Pittsburgh, Pennsylvania, USA charlieh@andrew.cmu.edu Kiran K. Thekumparampil Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign Champaign, Illinois, USA thekump2@illinois.edu Giulia Fanti Department of Electrical and Computer Engineering Carnegie Mellon University Pittsburgh, Pennsylvania, USA gfanti@andrew.cmu.edu Sewoong Oh Allen School of Computer Science and Engineering University of Washington Seattle, Washington, USA sewoong@cs.washington.edu
Pseudocode	Yes	Algorithm 1 Federated Chaining (Fed Chain) Algorithm 2 SGD Algorithm 3 ASG Algorithm 4 Fed Avg Algorithm 5 SAGA Algorithm 6 SSNM
Open Source Code	No	The paper does not provide an explicit statement or link for open-source code for the described methodology.
Open Datasets	Yes	We use the MNIST dataset of handwritten digits (Le Cun et al., 2010). We started with digit classiﬁcation over EMNIST (Cohen et al., 2017)... We also considered image classiﬁcation with Res Net-18 on CIFAR-100 (Krizhevsky, 2009).
Dataset Splits	No	The paper specifies train and test set sizes/clients but does not explicitly mention a validation set or details about dataset splits for training, validation, and testing beyond the given train/test counts.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU/CPU models, memory, or cloud instance types used for the experiments.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers (e.g., Python version, library versions like PyTorch or TensorFlow).
Experiment Setup	Yes	Hyperparameters. All experiments initialize iterates at 0 with regularization µ = 0.1. We ﬁx the total number of rounds R (differs across experiments). For algorithms without stepsize decay, the tuning process is as follows. We tune all stepsizes η in the range below: {10 3, 10 2.5, 10 2, 10 1.5, 10 1}. We tune the percentage of rounds before switching from Alocal to Aglobal (if applicable) as {10 2, 10 1.625, 10 1.25, 10 0.875, 10 0.5}. We set R = 500, and rather than ﬁxing the number of local steps, we ﬁx the number of local client epochs (the number of times a client performs gradient descent over its own dataset) to 20. The number of clients sampled per round is 10.