Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Adaptive Gradient Quantization for Data-Parallel SGD
Authors: Fartash Faghri, Iman Tabrizian, Ilia Markov, Dan Alistarh, Daniel M. Roy, Ali Ramezani-Kebrya
NeurIPS 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We improve the validation accuracy by almost 2% on CIFAR-10 and 1% on Image Net in challenging low-cost communication setups. |
| Researcher Affiliation | Collaboration | 1University of Toronto 2Vector Institute 3IST Austria 4Neural Magic |
| Pseudocode | Yes | Algorithm 1: Adaptive data-parallel SGD. |
| Open Source Code | Yes | Open source code: http://github.com/tabrizian/learning-to-quantize |
| Open Datasets | Yes | We present results for training Res Net-32 and Res Net-110 [28] on CIFAR-10 [29], and Res Net-18 on Image Net [30]. |
| Dataset Splits | Yes | We present results for training Res Net-32 and Res Net-110 [28] on CIFAR-10 [29], and Res Net-18 on Image Net [30]. We improve the validation accuracy by almost 2% on CIFAR-10 and 1% on Image Net in challenging low-cost communication setups. |
| Hardware Specification | No | The paper mentions '4-GPUs' and '16 and 32 GPUs' for training, but does not specify any exact GPU models (e.g., NVIDIA A100, Tesla V100), CPU models, or other detailed hardware specifications. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers. |
| Experiment Setup | Yes | Learning rate is decayed by a factor of 10 twice at 40K and 60K iterations. All quantization methods studied in this section share two hyper-parameters: the number of bits (log2 of number of quantization levels) and a bucket size. Bucket size for Res Net-110 trained on CIFAR-10 is 16384, for Res Net-32 is 8192, and for Res Net-18 on Image Net is 8192. Using only 3 bits (8 levels)... |