Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Communication-Efficient Distributed Blockwise Momentum SGD with Error-Feedback
Authors: Shuai Zheng, Ziyue Huang, James Kwok
NeurIPS 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that the proposed method converges as fast as full-precision distributed momentum SGD and achieves the same testing accuracy. In particular, on distributed Res Net training with 7 workers on the Image Net, the proposed algorithm achieves the same testing accuracy as momentum SGD using full-precision gradients, but with 46% less wall clock time. |
| Researcher Affiliation | Collaboration | Shuai Zheng 1,2, Ziyue Huang1, James T. Kwok1 EMAIL, EMAIL 1Department of Computer Science and Engineering Hong Kong University of Science and Technology 2Amazon Web Services |
| Pseudocode | Yes | Algorithm 2 Distributed SGD with Error-Feedback (dist-EF-SGD); Algorithm 3 Distributed Blockwise SGD with Error-Feedback (dist-EF-block SGD); Algorithm 4 Distributed Blockwise Momentum SGD with Error-Feedback (dist-EF-block SGDM) |
| Open Source Code | No | The paper mentions using 'publicly available code3 in [4]' for comparisons, but does not provide its own source code for the methodology described. |
| Open Datasets | Yes | Experiment is performed on the CIFAR-100 dataset, with 50K training images and 10K test images. [...] In this section, we perform distributed optimization on Image Net [15] using a 50-layer Res Net. |
| Dataset Splits | No | The paper mentions '50K training images and 10K test images' for CIFAR-100 but does not specify a separate validation set split. |
| Hardware Specification | Yes | For faster experimentation, we use a single node with multiple GPUs (an AWS P3.16 instance with 8 Nvidia V100 GPUs, each GPU being a worker) instead of a distributed setting. [...] Each worker is an AWS P3.2 instance with 1 GPU, and the parameter server is housed in one node. |
| Software Dependencies | No | The paper mentions 'MXNet', 'Py Torch', and 'Gloo communication library' but does not specify version numbers for any of these software dependencies. |
| Experiment Setup | Yes | We vary the mini-batch size per worker in {8, 16, 32}. [...] At epoch 100, the learning rate is reduced... |