Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Qsparse-local-SGD: Distributed SGD with Quantization, Sparsification and Local Computations

Authors: Debraj Basu, Deepesh Data, Can Karakus, Suhas Diggavi

NeurIPS 2019 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We use Qsparse-local-SGD to train Res Net-50 on Image Net, and show that it results in significant savings over the state-of-the-art, in the number of bits transmitted to reach target accuracy.
Researcher Affiliation Collaboration Debraj Basu Adobe Inc. EMAIL Deepesh Data UCLA EMAIL Can Karakus Amazon Inc. EMAIL Suhas Diggavi UCLA EMAIL
Pseudocode Yes Algorithm 1 Qsparse-local-SGD
Open Source Code Yes Our implementation is available at https://github.com/karakusc/horovod/tree/qsparselocal.
Open Datasets Yes We implement Qsparse-local-SGD for Res Net-50 using the Image Net dataset, and show that we achieve target accuracies... We also perform analogous experiments on the MNIST [19] handwritten digits dataset for softmax regression with a standard l2 regularizer...
Dataset Splits No The paper mentions using ImageNet and MNIST datasets and discusses training and testing, but does not provide specific details on how these datasets were split into training, validation, or test sets (e.g., percentages or sample counts for each split).
Hardware Specification Yes We train Res Net-50 [13] (which has d = 25, 610, 216 parameters) on Image Net dataset, using 8 NVIDIA Tesla V100 GPUs.
Software Dependencies No The paper mentions using the 'Horovod framework [28]' but does not specify its version number or any other software dependencies with version numbers.
Experiment Setup Yes We use a learning rate schedule consisting of 5 epochs of linear warmup, followed by a piecewise decay of 0.1 at epochs 30, 60 and 80, with a batch size of 256 per GPU. For experiments, we focus on SGD with momentum of 0.9, applied on the local iterations of the workers.