Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Qsparse-local-SGD: Distributed SGD with Quantization, Sparsification and Local Computations
Authors: Debraj Basu, Deepesh Data, Can Karakus, Suhas Diggavi
NeurIPS 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We use Qsparse-local-SGD to train Res Net-50 on Image Net, and show that it results in significant savings over the state-of-the-art, in the number of bits transmitted to reach target accuracy. |
| Researcher Affiliation | Collaboration | Debraj Basu Adobe Inc. EMAIL Deepesh Data UCLA EMAIL Can Karakus Amazon Inc. EMAIL Suhas Diggavi UCLA EMAIL |
| Pseudocode | Yes | Algorithm 1 Qsparse-local-SGD |
| Open Source Code | Yes | Our implementation is available at https://github.com/karakusc/horovod/tree/qsparselocal. |
| Open Datasets | Yes | We implement Qsparse-local-SGD for Res Net-50 using the Image Net dataset, and show that we achieve target accuracies... We also perform analogous experiments on the MNIST [19] handwritten digits dataset for softmax regression with a standard l2 regularizer... |
| Dataset Splits | No | The paper mentions using ImageNet and MNIST datasets and discusses training and testing, but does not provide specific details on how these datasets were split into training, validation, or test sets (e.g., percentages or sample counts for each split). |
| Hardware Specification | Yes | We train Res Net-50 [13] (which has d = 25, 610, 216 parameters) on Image Net dataset, using 8 NVIDIA Tesla V100 GPUs. |
| Software Dependencies | No | The paper mentions using the 'Horovod framework [28]' but does not specify its version number or any other software dependencies with version numbers. |
| Experiment Setup | Yes | We use a learning rate schedule consisting of 5 epochs of linear warmup, followed by a piecewise decay of 0.1 at epochs 30, 60 and 80, with a batch size of 256 per GPU. For experiments, we focus on SGD with momentum of 0.9, applied on the local iterations of the workers. |