Scalable methods for 8-bit training of neural networks

Authors: Ron Banner, Itay Hubara, Elad Hoffer, Daniel Soudry

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our simulations show that Range BN is equivalent to the traditional batch norm if a precise scale adjustment, which can be approximated analytically, is applied. Experiments on Image Net with Res18 and Res50 showed no distinguishable difference between accuracy of Range BN and traditional BN. We evaluated the ideas of Range Batch-Norm and Quantized Back-Propagation on multiple different models and datasets. The code to replicate all of our experiments is available on-line 2. Experiment results on cifar-10 dataset. Experiment results on Image Net dataset.
Researcher Affiliation Collaboration Ron Banner1 , Itay Hubara2 , Elad Hoffer2 , Daniel Soudry2 {itayhubara, elad.hoffer, daniel.soudry}@gmail.com {ron.banner}@intel.com (1) Intel Artificial Intelligence Products Group (AIPG) (2) Technion Israel Institute of Technology, Haifa, Israel
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes The code to replicate all of our experiments is available on-line 2. https://github.com/eladhoffer/quantized.pytorch
Open Datasets Yes To the best of the authors knowledge, this work is the first to quantize the weights, activations, as well as a substantial volume of the gradients stream, in all layers (including batch normalization) to 8-bit while showing state-of-the-art results over the Image Net-1K dataset. Experiments on Image Net with Res18 and Res50 showed no distinguishable difference between accuracy of Range BN and traditional BN. Experiment results on cifar-10 dataset.
Dataset Splits No The paper mentions 'validation accuracy' and evaluates on datasets like ImageNet and CIFAR-10, but it does not specify the explicit percentages or sample counts for the training, validation, or test splits. It implicitly relies on standard splits for these public datasets without explicitly stating them.
Hardware Specification Yes A Titan Xp used for this research was donated by the NVIDIA Corporation.
Software Dependencies No The paper mentions using 'GEMMLOWP quantization scheme as decribed in Google s open source library [1]' but does not provide specific version numbers for GEMMLOWP or any other software dependencies like Python, PyTorch, etc.
Experiment Setup Yes To validate this low precision scheme, we were quantizing the vast majority of operations to 8-bit. The only operations left at higher precising were the updates (float32) needed to accumulate small changes from stochastic gradient descent, and a copy of the layer gradients at 16 bits needed to compute g W . The float32 updates are done once per minibatch while the propagations are done for each example (e.g., for a minibatch of 256 examples the updates constitute less than 0.4% of the training effort).