Stochastic Controlled Averaging for Federated Learning with Communication Compression
Authors: Xinmeng Huang, Ping Li, Xiaoyun Li
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that SCALLION and SCAFCOM outperform recent compressed FL methods under the same communication budget. We conduct experiments to illustrate the effectiveness of our proposed methods. |
| Researcher Affiliation | Collaboration | The work is conducted at Linked In Bellevue, 98004 WA, USA. Xinmeng Huang is a Ph.D. student in the Graduate Group of Applied Mathematics and Computational Science at the University of Pennsylvania. |
| Pseudocode | Yes | Algorithm 1 SCALLION: SCAFFOLD with single compressed uplink communication. Algorithm 2 SCAFCOM: SCAFFOLD with momentum-enhanced compression. |
| Open Source Code | No | No explicit statement or link indicating the release of open-source code for the described methodology was found. |
| Open Datasets | Yes | We test our algorithms on two standard FL datasets: MNIST dataset (Le Cun, 1998) and Fashion MNIST dataset (Xiao et al., 2017). |
| Dataset Splits | No | The training data are distributed across N = 200 clients, in a highly heterogeneous setting following (Li & Li, 2023). The training data samples are split into 400 shards each containing samples from only one class. Then, each client is randomly assigned two shards of data. |
| Hardware Specification | No | No specific hardware specifications (e.g., GPU models, CPU types, memory) for running experiments were mentioned in the paper. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., programming language, libraries, frameworks) were mentioned in the paper. |
| Experiment Setup | Yes | In each round of client-server interaction, we uniformly randomly pick S = 20 clients to participate in FL training, i.e., the partial participation rate is 10%. Each participating client performs K = 10 local training steps using the local data, with a mini-batch size 32... We tune the combination of the global learning rate ηg and the local learning rate ηl over the 2D grid {0.001, 0.003, 0.01, 0.03, 0.1, 0.3, 1, 3, 10}2. |