Communication-Efficient Distributed Blockwise Momentum SGD with Error-Feedback
Authors: Shuai Zheng, Ziyue Huang, James Kwok
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that the proposed method converges as fast as full-precision distributed momentum SGD and achieves the same testing accuracy. In particular, on distributed Res Net training with 7 workers on the Image Net, the proposed algorithm achieves the same testing accuracy as momentum SGD using full-precision gradients, but with 46% less wall clock time. |
| Researcher Affiliation | Collaboration | Shuai Zheng 1,2, Ziyue Huang1, James T. Kwok1 shzheng@amazon.com, {zhuangbq, jamesk}@cse.ust.hk 1Department of Computer Science and Engineering Hong Kong University of Science and Technology 2Amazon Web Services |
| Pseudocode | Yes | Algorithm 2 Distributed SGD with Error-Feedback (dist-EF-SGD); Algorithm 3 Distributed Blockwise SGD with Error-Feedback (dist-EF-block SGD); Algorithm 4 Distributed Blockwise Momentum SGD with Error-Feedback (dist-EF-block SGDM) |
| Open Source Code | No | The paper mentions using 'publicly available code3 in [4]' for comparisons, but does not provide its own source code for the methodology described. |
| Open Datasets | Yes | Experiment is performed on the CIFAR-100 dataset, with 50K training images and 10K test images. [...] In this section, we perform distributed optimization on Image Net [15] using a 50-layer Res Net. |
| Dataset Splits | No | The paper mentions '50K training images and 10K test images' for CIFAR-100 but does not specify a separate validation set split. |
| Hardware Specification | Yes | For faster experimentation, we use a single node with multiple GPUs (an AWS P3.16 instance with 8 Nvidia V100 GPUs, each GPU being a worker) instead of a distributed setting. [...] Each worker is an AWS P3.2 instance with 1 GPU, and the parameter server is housed in one node. |
| Software Dependencies | No | The paper mentions 'MXNet', 'Py Torch', and 'Gloo communication library' but does not specify version numbers for any of these software dependencies. |
| Experiment Setup | Yes | We vary the mini-batch size per worker in {8, 16, 32}. [...] At epoch 100, the learning rate is reduced... |