DSGD-CECA: Decentralized SGD with Communication-Optimal Exact Consensus Algorithm

Authors: Lisang Ding, Kexin Jin, Bicheng Ying, Kun Yuan, Wotao Yin

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we validate the previous theoretical results via numerical experiments. First, we show CECA-1P indeed achieves the global consensus in finite iterations over a variety of choices of the number of nodes. Next, we examine the performance of DSGD-CECA and compare it with many other popular SOTA algorithms on a standard convex task. Lastly, we apply the DSGD-CECA on the deep learning setting to show it still achieves good performance in train loss and test accuracy with respect to the iterations and communicated data.
Researcher Affiliation Collaboration 1Department of Mathematics, University of California, Los Angeles, CA, USA 2Department of Mathematics, Princeton University, Princeton, NJ, USA 3Google Inc., Los Angeles, CA, USA 4Center for Machine Learning Research, Peking University, Beijing, P. R. China. 5AI for Science Institute, Beijing, P. R. China 6National Engineering Labratory for Big Data Analytics and Applications, Beijing, P. R. China 7Decision Intelligence Lab, Alibaba US, Bellevue, WA, USA.
Pseudocode Yes Algorithm 1 DSGD-CECA [...] Algorithm 2 CECA for the 2-port system
Open Source Code Yes The codes used to generate the figures in this section are available in the github1. [footnote] 1https://github.com/kexinjinnn/DSGD-CECA
Open Datasets Yes We apply DSGD-CECA-2P to solve the image classification task with CNN over MNIST dataset (Le Cun et al., 2010). [...] We also provide additional experiments on CIFAR-10 dataset (Krizhevsky & Hinton, 2009) in Appendix C.
Dataset Splits No The paper mentions training and testing for MNIST ('training loss and test accuracy curves') and CIFAR-10 ('The training process consists of 130 epochs' and 'test accuracy'), but it does not explicitly specify a validation dataset or how it was used.
Hardware Specification Yes As for the implementation of decentralized parallel training, we utilize Blue Fog library (Ying et al., 2021b) in a cluster of 17 NVIDIA Ge Force RTX 2080 GPUs. [...] Similar to the MNIST experiments, we employ Blue Fog for decentralized training using 5 NVIDIA Ge Force RTX 2080 GPUs.
Software Dependencies No The paper mentions using the 'Blue Fog library' and references a PyTorch implementation for CIFAR-10 ('Train CIFAR10 with Py Torch'). However, it does not provide specific version numbers for these software components or any other libraries/frameworks.
Experiment Setup Yes The local batch size is set to 64. The learning rate is 0.3 for DSGD-CECA-2P with no momentum and 0.1 for other algorithms with momentum 0.5. The results are obtained by averaging over 3 independent random experiments. [...] The training process consists of 130 epochs without momentum, with a weight decay of 10 4. A local batch size of 64 is used, and the base learning rate is set to 0.01. The learning rate is reduced by a factor of 10 at the 50th, 100th, and 120th epochs.