reproducibilityindex.ai

Adaptive Gradient Quantization for Data-Parallel SGD

Authors: Fartash Faghri, Iman Tabrizian, Ilia Markov, Dan Alistarh, Daniel M. Roy, Ali Ramezani-Kebrya

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We improve the validation accuracy by almost 2% on CIFAR-10 and 1% on Image Net in challenging low-cost communication setups.
Researcher Affiliation	Collaboration	1University of Toronto 2Vector Institute 3IST Austria 4Neural Magic
Pseudocode	Yes	Algorithm 1: Adaptive data-parallel SGD.
Open Source Code	Yes	Open source code: http://github.com/tabrizian/learning-to-quantize
Open Datasets	Yes	We present results for training Res Net-32 and Res Net-110 [28] on CIFAR-10 [29], and Res Net-18 on Image Net [30].
Dataset Splits	Yes	We present results for training Res Net-32 and Res Net-110 [28] on CIFAR-10 [29], and Res Net-18 on Image Net [30]. We improve the validation accuracy by almost 2% on CIFAR-10 and 1% on Image Net in challenging low-cost communication setups.
Hardware Specification	No	The paper mentions '4-GPUs' and '16 and 32 GPUs' for training, but does not specify any exact GPU models (e.g., NVIDIA A100, Tesla V100), CPU models, or other detailed hardware specifications.
Software Dependencies	No	The paper does not list specific software dependencies with version numbers.
Experiment Setup	Yes	Learning rate is decayed by a factor of 10 twice at 40K and 60K iterations. All quantization methods studied in this section share two hyper-parameters: the number of bits (log2 of number of quantization levels) and a bucket size. Bucket size for Res Net-110 trained on CIFAR-10 is 16384, for Res Net-32 is 8192, and for Res Net-18 on Image Net is 8192. Using only 3 bits (8 levels)...