Coordinating Momenta for Cross-Silo Federated Learning

Authors: An Xu, Heng Huang8735-8743

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive deep FL experimental results verify that our new approach has a better training performance than the Fed Avg and existing standard momentum SGD variants.
Researcher Affiliation Academia Electrical and Computer Engineering Department, University of Pittsburgh, PA, USA an.xu@pitt.edu, heng.huang@pitt.edu
Pseudocode Yes Algorithm 1: FL with double momenta.
Open Source Code No The paper does not provide any explicit statements about open-source code release or repository links for the described methodology.
Open Datasets Yes We train VGG-16 (Simonyan and Zisserman 2014) and Res Net-56 (He et al. 2016) models on CIFAR-10/1001 (Krizhevsky 2009), and Res Net-20 on SVHN2 image classification tasks. 1https://www.cs.toronto.edu/ kriz/cifar.html 2http://ufldl.stanford.edu/housenumbers/
Dataset Splits Yes We follow (Karimireddy et al. 2020b) to simulate the non-i.i.d. data distribution. Specifically, fraction s of the data are randomly selected and allocated to clients, while the remaining fraction 1 s are allocated by sorting according to the label. The data similarity is hence s. We run experiments with data similarity s in {5%, 10%, 20%}. By default, the data similarity is set to 10% and the number of clients (GPUs) K = 16 following (Wang et al. 2020). and We use local epoch E instead of local training steps P in experiments. E = 1 is identical to one pass training of local data. We test local epoch E {0.5, 1, 2} and E = 1 by default.
Hardware Specification Yes All experiments are implemented using Py Torch (Paszke et al. 2019) and run on a cluster where each node is equipped with 4 Tesla P40 GPUs and 64 Intel(R) Xeon(R) CPU E5-2683 v4 cores @ 2.10GHz.
Software Dependencies No The paper mentions 'Py Torch (Paszke et al. 2019)' but does not specify a version number for PyTorch or other software dependencies.
Experiment Setup Yes We perform careful hyper-parameters tuning for all methods. The local momentum constant µl is selected from {0.9, 0.8, 0.6, 0.4, 0.2}. We select the server momentum constant µs from {0.9, 0.6, 0.3}. The base learning rate is selected from {..., 4 10 1, 2 10 1, 1 10 1, 5 10 2, 1 10 2, 5 10 3, ...}. The server learning rate α is selected from {0.2, 0.4, 0.6, 0.8, 0.9, 1.0}. The momentum fusion constant β is selected from {0.2, 0.4, 0.6, 0.8, 0.9, 1.0}.