Adaptive Gradient Quantization for Data-Parallel SGD

Authors: Fartash Faghri, Iman Tabrizian, Ilia Markov, Dan Alistarh, Daniel M. Roy, Ali Ramezani-Kebrya

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We improve the validation accuracy by almost 2% on CIFAR-10 and 1% on Image Net in challenging low-cost communication setups.
Researcher Affiliation Collaboration 1University of Toronto 2Vector Institute 3IST Austria 4Neural Magic
Pseudocode Yes Algorithm 1: Adaptive data-parallel SGD.
Open Source Code Yes Open source code: http://github.com/tabrizian/learning-to-quantize
Open Datasets Yes We present results for training Res Net-32 and Res Net-110 [28] on CIFAR-10 [29], and Res Net-18 on Image Net [30].
Dataset Splits Yes We present results for training Res Net-32 and Res Net-110 [28] on CIFAR-10 [29], and Res Net-18 on Image Net [30]. We improve the validation accuracy by almost 2% on CIFAR-10 and 1% on Image Net in challenging low-cost communication setups.
Hardware Specification No The paper mentions '4-GPUs' and '16 and 32 GPUs' for training, but does not specify any exact GPU models (e.g., NVIDIA A100, Tesla V100), CPU models, or other detailed hardware specifications.
Software Dependencies No The paper does not list specific software dependencies with version numbers.
Experiment Setup Yes Learning rate is decayed by a factor of 10 twice at 40K and 60K iterations. All quantization methods studied in this section share two hyper-parameters: the number of bits (log2 of number of quantization levels) and a bucket size. Bucket size for Res Net-110 trained on CIFAR-10 is 16384, for Res Net-32 is 8192, and for Res Net-18 on Image Net is 8192. Using only 3 bits (8 levels)...