Qsparse-local-SGD: Distributed SGD with Quantization, Sparsification and Local Computations

Authors: Debraj Basu, Deepesh Data, Can Karakus, Suhas Diggavi

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We use Qsparse-local-SGD to train Res Net-50 on Image Net, and show that it results in significant savings over the state-of-the-art, in the number of bits transmitted to reach target accuracy.
Researcher Affiliation Collaboration Debraj Basu Adobe Inc. dbasu@adobe.com Deepesh Data UCLA deepeshdata@ucla.edu Can Karakus Amazon Inc. cakarak@amazon.com Suhas Diggavi UCLA suhasdiggavi@ucla.edu
Pseudocode Yes Algorithm 1 Qsparse-local-SGD
Open Source Code Yes Our implementation is available at https://github.com/karakusc/horovod/tree/qsparselocal.
Open Datasets Yes We implement Qsparse-local-SGD for Res Net-50 using the Image Net dataset, and show that we achieve target accuracies... We also perform analogous experiments on the MNIST [19] handwritten digits dataset for softmax regression with a standard l2 regularizer...
Dataset Splits No The paper mentions using ImageNet and MNIST datasets and discusses training and testing, but does not provide specific details on how these datasets were split into training, validation, or test sets (e.g., percentages or sample counts for each split).
Hardware Specification Yes We train Res Net-50 [13] (which has d = 25, 610, 216 parameters) on Image Net dataset, using 8 NVIDIA Tesla V100 GPUs.
Software Dependencies No The paper mentions using the 'Horovod framework [28]' but does not specify its version number or any other software dependencies with version numbers.
Experiment Setup Yes We use a learning rate schedule consisting of 5 epochs of linear warmup, followed by a piecewise decay of 0.1 at epochs 30, 60 and 80, with a batch size of 256 per GPU. For experiments, we focus on SGD with momentum of 0.9, applied on the local iterations of the workers.