Adaptive Gradient Quantization for Data-Parallel SGD
Authors: Fartash Faghri, Iman Tabrizian, Ilia Markov, Dan Alistarh, Daniel M. Roy, Ali Ramezani-Kebrya
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We improve the validation accuracy by almost 2% on CIFAR-10 and 1% on Image Net in challenging low-cost communication setups. |
| Researcher Affiliation | Collaboration | 1University of Toronto 2Vector Institute 3IST Austria 4Neural Magic |
| Pseudocode | Yes | Algorithm 1: Adaptive data-parallel SGD. |
| Open Source Code | Yes | Open source code: http://github.com/tabrizian/learning-to-quantize |
| Open Datasets | Yes | We present results for training Res Net-32 and Res Net-110 [28] on CIFAR-10 [29], and Res Net-18 on Image Net [30]. |
| Dataset Splits | Yes | We present results for training Res Net-32 and Res Net-110 [28] on CIFAR-10 [29], and Res Net-18 on Image Net [30]. We improve the validation accuracy by almost 2% on CIFAR-10 and 1% on Image Net in challenging low-cost communication setups. |
| Hardware Specification | No | The paper mentions '4-GPUs' and '16 and 32 GPUs' for training, but does not specify any exact GPU models (e.g., NVIDIA A100, Tesla V100), CPU models, or other detailed hardware specifications. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers. |
| Experiment Setup | Yes | Learning rate is decayed by a factor of 10 twice at 40K and 60K iterations. All quantization methods studied in this section share two hyper-parameters: the number of bits (log2 of number of quantization levels) and a bucket size. Bucket size for Res Net-110 trained on CIFAR-10 is 16384, for Res Net-32 is 8192, and for Res Net-18 on Image Net is 8192. Using only 3 bits (8 levels)... |