reproducibilityindex.ai

Federated Learning with Matched Averaging

Authors: Hongyi Wang, Mikhail Yurochkin, Yuekai Sun, Dimitris Papailiopoulos, Yasaman Khazaeni

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments indicate that Fed MA not only outperforms popular state-of-the-art federated learning algorithms on deep CNN and LSTM architectures trained on real world datasets, but also reduces the overall communication burden.1
Researcher Affiliation	Collaboration	Hongyi Wang Department of Computer Sciences University of Wisconsin-Madison hongyiwang@cs.wisc.edu Mikhail Yurochkin IBM Research MIT-IBM Watson AI Lab mikhail.yurochkin@ibm.com Yuekai Sun Department of Statistics University of Michigan yuekai@umich.edu Dimitris Papailiopoulos Department of Electrical and Computer Engineering University of Wisconsin-Madison dimitris@papail.io Yasaman Khazaeni IBM Research yasaman.khazaeni@us.ibm.com
Pseudocode	Yes	Algorithm 1: Federated Matched Averaging (Fed MA)
Open Source Code	Yes	Code is available at https://github.com/IBM/Fed MA
Open Datasets	Yes	Our experimental studies are conducted over three real world datasets. Summary information about the datasets and associated models can be found in supplement Table 3. ... MNIST ... CIFAR-10 ... Shakespeare (Mc Mahan et al., 2017)
Dataset Splits	No	For CIFAR-10, we considered two data partition strategies to simulate federated learning scenario: (i) homogeneous partition where each local client has approximately equal proportion of each of the classes; (ii) heterogeneous partition for which number of data points and class proportions are unbalanced. ... We use the original test set in CIFAR-10 as our global test set for comparing performance of all methods. For the Shakespeare dataset, ... We allocate 80% of the data for training and amalgamate the remaining data into a global test set. The paper specifies train and test splits, and refers to the 'original test set' for CIFAR-10 which implies a standard split. However, it does not explicitly provide a separate validation split with percentages or counts for any of the datasets used.
Hardware Specification	Yes	All nodes in our experiments are deployed on p3.2xlarge instances on Amazon EC2.
Software Dependencies	No	We implemented Fed MA and the considered baseline methods in Py Torch (Paszke et al., 2017). ... We use AMSGRAD (Reddi et al., 2018) method for the oversampling baseline. While PyTorch and AMSGRAD are mentioned, specific version numbers for PyTorch or other libraries/frameworks are not provided.
Experiment Setup	Yes	The details of the datasets and hyper-parameters used in our experiments are summarized in Table 3. In conducting the freezing and retraining process of Fed MA, we notice when retraining the last FC layer while keeping all previous layers frozen, the initial learning rate we use for SGD doesn t lead to a good convergence (this is only for the VGG-9 architecture). To ﬁx this issue, we divide the initial learning rate by 10 i.e. using 10 4 for the last FC layer retraining and allow the clients to retrain for 3 times more epochs. ... For each of the candidate E, we run Fed MA for 6 rounds while Fed Avg and Fed Prox for 54 rounds. ... We empirically analyze the different choices of the three hyper-parameters and ﬁnd the choice of γ0 = 7, σ2 0 = 1, σ2 = 1 for VGG-9 on CIFAR-10 dataset and γ0 = 10 3, σ2 0 = 1, σ2 = 1 for LSTM on Shakespeare dataset lead to good performance in our experimental studies.