reproducibilityindex.ai

ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training

Authors: Chia-Yu Chen, Jiamin Ni, Songtao Lu, Xiaodong Cui, Pin-Yu Chen, Xiao Sun, Naigang Wang, Swagath Venkataramani, Vijayalakshmi (Viji) Srinivasan, Wei Zhang, Kailash Gopalakrishnan

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	4 Experimental Results We apply Scale Com to three major applications: vision (Image Net, CIFAR10), language (WMT14 En-De), and speech (SWB300).
Researcher Affiliation	Industry	Chia-Yu Chen1, Jiamin Ni2, Songtao Lu2, Xiaodong Cui1, Pin-Yu Chen2 Xiao Sun1, Naigang Wang1, Swagath Venkataramani2 Vijayalakshmi Srinivasan1, Wei Zhang1, Kailash Gopalakrishnan1 IBM T. J. Watson Research Center Yorktown Heights, NY 10598, USA 1{cchen, cuix, xsun, nwang, viji, weiz, kailash}@us.ibm.com 2{jiamin.ni, songtao, pin-yu.chen, swagath.venkataramani}@ibm.com
Pseudocode	Yes	Algorithm 1 Scale Com: Scalable Sparsiﬁed Gradient Compression
Open Source Code	No	The paper does not provide an explicit statement about releasing the source code for the described methodology or a link to a code repository.
Open Datasets	Yes	We apply Scale Com to three major applications: vision (Image Net, CIFAR10), language (WMT14 En-De), and speech (SWB300).
Dataset Splits	No	The paper mentions various datasets (Image Net, CIFAR10, WMT14 En-De, SWB300) and batch sizes, but does not explicitly provide specific percentages, sample counts, or detailed methodology for dataset splits (e.g., train/validation/test) to enable reproduction of the data partitioning.
Hardware Specification	Yes	Experiments are run on IBM POWER System AC922 systems using implementations in Py Torch.
Software Dependencies	No	The paper mentions 'Py Torch' but does not specify its version number or any other software dependencies with their versions.
Experiment Setup	Yes	We use 1-5 warm-up epochs (<10% total training epochs) for compression. A conservative engineering guidance is proposed for compression rate settings in each layer based upon the ratio FLOPs/gradient: 25X for ratio in the range [196, 1]; 50X for [128, 196], and 400X for (0, 128]. ... this guidance is based on the per-worker mini-batch size, 32 for vision and speech and 4.5k for language. ... In these experiments, we adopt hyper-parameter settings from [1][3][5] (including learning rates and momentum)... we set β=1 in the low-pass ﬁlter... Once the proposed low-pass ﬁlter is applied (β=0.1), Scale Com achieves almost identical test accuracies.