Scalable methods for 8-bit training of neural networks
Authors: Ron Banner, Itay Hubara, Elad Hoffer, Daniel Soudry
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our simulations show that Range BN is equivalent to the traditional batch norm if a precise scale adjustment, which can be approximated analytically, is applied. Experiments on Image Net with Res18 and Res50 showed no distinguishable difference between accuracy of Range BN and traditional BN. We evaluated the ideas of Range Batch-Norm and Quantized Back-Propagation on multiple different models and datasets. The code to replicate all of our experiments is available on-line 2. Experiment results on cifar-10 dataset. Experiment results on Image Net dataset. |
| Researcher Affiliation | Collaboration | Ron Banner1 , Itay Hubara2 , Elad Hoffer2 , Daniel Soudry2 {itayhubara, elad.hoffer, daniel.soudry}@gmail.com {ron.banner}@intel.com (1) Intel Artificial Intelligence Products Group (AIPG) (2) Technion Israel Institute of Technology, Haifa, Israel |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code to replicate all of our experiments is available on-line 2. https://github.com/eladhoffer/quantized.pytorch |
| Open Datasets | Yes | To the best of the authors knowledge, this work is the first to quantize the weights, activations, as well as a substantial volume of the gradients stream, in all layers (including batch normalization) to 8-bit while showing state-of-the-art results over the Image Net-1K dataset. Experiments on Image Net with Res18 and Res50 showed no distinguishable difference between accuracy of Range BN and traditional BN. Experiment results on cifar-10 dataset. |
| Dataset Splits | No | The paper mentions 'validation accuracy' and evaluates on datasets like ImageNet and CIFAR-10, but it does not specify the explicit percentages or sample counts for the training, validation, or test splits. It implicitly relies on standard splits for these public datasets without explicitly stating them. |
| Hardware Specification | Yes | A Titan Xp used for this research was donated by the NVIDIA Corporation. |
| Software Dependencies | No | The paper mentions using 'GEMMLOWP quantization scheme as decribed in Google s open source library [1]' but does not provide specific version numbers for GEMMLOWP or any other software dependencies like Python, PyTorch, etc. |
| Experiment Setup | Yes | To validate this low precision scheme, we were quantizing the vast majority of operations to 8-bit. The only operations left at higher precising were the updates (float32) needed to accumulate small changes from stochastic gradient descent, and a copy of the layer gradients at 16 bits needed to compute g W . The float32 updates are done once per minibatch while the propagations are done for each example (e.g., for a minibatch of 256 examples the updates constitute less than 0.4% of the training effort). |