Step-Ahead Error Feedback for Distributed Training with Compressed Gradient

Authors: An Xu, Zhouyuan Huo, Heng Huang10478-10486

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Both our theoretical and empirical results show that our new methods can handle the gradient mismatch problem. The experimental results show that we can even train faster with common gradient compression schemes than both the full-precision training and local error feedback regarding the training epochs and without performance loss.
Researcher Affiliation Collaboration 1 Electrical and Computer Engineering Department, University of Pittsburgh, PA, USA 2 Google, Mountain View, CA, USA 3 JD Finance America Corporation, Mountain View, CA, USA {an.xu, heng.huang}@pitt.edu, zhouyuan.huo@gmail.com
Pseudocode Yes Algorithm 1 Distributed Momentum SGD with Double Way Compression.
Open Source Code No The paper does not provide concrete access to source code for the methodology described.
Open Datasets Yes We train the Res Net-56 (He et al. 2016) model with multiple workers (GPUs) on CIFAR-100 (Krizhevsky, Hinton et al. 2009) image classification task.
Dataset Splits No The paper does not provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) for training, validation, and test sets.
Hardware Specification No The paper mentions "multiple workers (GPUs)" but does not provide specific hardware details like exact GPU/CPU models, processor types, or memory amounts used for running its experiments.
Software Dependencies No All experiments are implemented with Py Torch (Paszke et al. 2019).
Experiment Setup Yes The base learning rate is 0.1 and the total batch size is 128. The momentum constant is 0.9 and the weight decay is 5 10 4. For momentum SGD the model is trained for 200 epochs with a learning rate decay of 0.1 at epoch 100 and 150.