On the Convergence of Communication-Efficient Local SGD for Federated Learning

Authors: Hongchang Gao, An Xu, Heng Huang7510-7518

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental At last, extensive experiments are conducted to verify the performance of our proposed methods. and Extensive experimental results confirmed the effectiveness of our proposed methods.
Researcher Affiliation Collaboration 1 Department of Computer and Information Sciences, Temple University, PA, USA 2 Department of Electrical and Computer Engineering, University of Pittsburgh, PA, USA 3 JD Finance America Corporation, Mountain View, CA, USA
Pseudocode Yes Algorithm 1 Local SGD with Compressed Gradients and Algorithm 2 Momentum Local SGD with Compressed Gradients
Open Source Code No The paper does not provide an explicit statement or link for open-source code for the described methodology.
Open Datasets Yes CIFAR-10: We test Res Net-56 (He et al. 2016) with all the above mentioned algorithms on CIFAR-10 dataset (Krizhevsky, Hinton et al. 2009)., Image Net: We test Res Net-50 (He et al. 2016) on Image Net dataset (Russakovsky et al. 2015) 2.
Dataset Splits No The paper mentions training and testing but does not specify explicit train/validation/test dataset splits by percentage, count, or a reference to predefined splits.
Hardware Specification Yes All experiments are implemented in Py Torch (Paszke et al. 2019) and run on a cluster with NVIDIA Tesla P40 GPUs, where nodes are interconnected by a network with 40 Gbps bandwidth.
Software Dependencies No The paper mentions 'Py Torch (Paszke et al. 2019)' but does not specify its version number or any other software dependencies with versions.
Experiment Setup Yes The base learning rate is 0.1, the weight decay is 5 10 4 and the total batch size is 128. For local SGD, the model is trained for 150 epoch in total, with a learning rate decay of 0.1 at epoch 100. For momentum local SGD, the model is trained for 200 epoch in total, with a learning rate decay of 0.1 at epoch 100 and 150.