Federated Learning with Matched Averaging

Authors: Hongyi Wang, Mikhail Yurochkin, Yuekai Sun, Dimitris Papailiopoulos, Yasaman Khazaeni

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments indicate that Fed MA not only outperforms popular state-of-the-art federated learning algorithms on deep CNN and LSTM architectures trained on real world datasets, but also reduces the overall communication burden.1
Researcher Affiliation Collaboration Hongyi Wang Department of Computer Sciences University of Wisconsin-Madison hongyiwang@cs.wisc.edu Mikhail Yurochkin IBM Research MIT-IBM Watson AI Lab mikhail.yurochkin@ibm.com Yuekai Sun Department of Statistics University of Michigan yuekai@umich.edu Dimitris Papailiopoulos Department of Electrical and Computer Engineering University of Wisconsin-Madison dimitris@papail.io Yasaman Khazaeni IBM Research yasaman.khazaeni@us.ibm.com
Pseudocode Yes Algorithm 1: Federated Matched Averaging (Fed MA)
Open Source Code Yes Code is available at https://github.com/IBM/Fed MA
Open Datasets Yes Our experimental studies are conducted over three real world datasets. Summary information about the datasets and associated models can be found in supplement Table 3. ... MNIST ... CIFAR-10 ... Shakespeare (Mc Mahan et al., 2017)
Dataset Splits No For CIFAR-10, we considered two data partition strategies to simulate federated learning scenario: (i) homogeneous partition where each local client has approximately equal proportion of each of the classes; (ii) heterogeneous partition for which number of data points and class proportions are unbalanced. ... We use the original test set in CIFAR-10 as our global test set for comparing performance of all methods. For the Shakespeare dataset, ... We allocate 80% of the data for training and amalgamate the remaining data into a global test set. The paper specifies train and test splits, and refers to the 'original test set' for CIFAR-10 which implies a standard split. However, it does not explicitly provide a separate validation split with percentages or counts for any of the datasets used.
Hardware Specification Yes All nodes in our experiments are deployed on p3.2xlarge instances on Amazon EC2.
Software Dependencies No We implemented Fed MA and the considered baseline methods in Py Torch (Paszke et al., 2017). ... We use AMSGRAD (Reddi et al., 2018) method for the oversampling baseline. While PyTorch and AMSGRAD are mentioned, specific version numbers for PyTorch or other libraries/frameworks are not provided.
Experiment Setup Yes The details of the datasets and hyper-parameters used in our experiments are summarized in Table 3. In conducting the freezing and retraining process of Fed MA, we notice when retraining the last FC layer while keeping all previous layers frozen, the initial learning rate we use for SGD doesn t lead to a good convergence (this is only for the VGG-9 architecture). To fix this issue, we divide the initial learning rate by 10 i.e. using 10 4 for the last FC layer retraining and allow the clients to retrain for 3 times more epochs. ... For each of the candidate E, we run Fed MA for 6 rounds while Fed Avg and Fed Prox for 54 rounds. ... We empirically analyze the different choices of the three hyper-parameters and find the choice of γ0 = 7, σ2 0 = 1, σ2 = 1 for VGG-9 on CIFAR-10 dataset and γ0 = 10 3, σ2 0 = 1, σ2 = 1 for LSTM on Shakespeare dataset lead to good performance in our experimental studies.