reproducibilityindex.ai

Step-Ahead Error Feedback for Distributed Training with Compressed Gradient

Authors: An Xu, Zhouyuan Huo, Heng Huang10478-10486

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Both our theoretical and empirical results show that our new methods can handle the gradient mismatch problem. The experimental results show that we can even train faster with common gradient compression schemes than both the full-precision training and local error feedback regarding the training epochs and without performance loss.
Researcher Affiliation	Collaboration	1 Electrical and Computer Engineering Department, University of Pittsburgh, PA, USA 2 Google, Mountain View, CA, USA 3 JD Finance America Corporation, Mountain View, CA, USA {an.xu, heng.huang}@pitt.edu, zhouyuan.huo@gmail.com
Pseudocode	Yes	Algorithm 1 Distributed Momentum SGD with Double Way Compression.
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described.
Open Datasets	Yes	We train the Res Net-56 (He et al. 2016) model with multiple workers (GPUs) on CIFAR-100 (Krizhevsky, Hinton et al. 2009) image classiﬁcation task.
Dataset Splits	No	The paper does not provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) for training, validation, and test sets.
Hardware Specification	No	The paper mentions "multiple workers (GPUs)" but does not provide specific hardware details like exact GPU/CPU models, processor types, or memory amounts used for running its experiments.
Software Dependencies	No	All experiments are implemented with Py Torch (Paszke et al. 2019).
Experiment Setup	Yes	The base learning rate is 0.1 and the total batch size is 128. The momentum constant is 0.9 and the weight decay is 5 10 4. For momentum SGD the model is trained for 200 epochs with a learning rate decay of 0.1 at epoch 100 and 150.