reproducibilityindex.ai

Scalable methods for 8-bit training of neural networks

Authors: Ron Banner, Itay Hubara, Elad Hoffer, Daniel Soudry

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our simulations show that Range BN is equivalent to the traditional batch norm if a precise scale adjustment, which can be approximated analytically, is applied. Experiments on Image Net with Res18 and Res50 showed no distinguishable difference between accuracy of Range BN and traditional BN. We evaluated the ideas of Range Batch-Norm and Quantized Back-Propagation on multiple different models and datasets. The code to replicate all of our experiments is available on-line 2. Experiment results on cifar-10 dataset. Experiment results on Image Net dataset.
Researcher Affiliation	Collaboration	Ron Banner1 , Itay Hubara2 , Elad Hoffer2 , Daniel Soudry2 {itayhubara, elad.hoffer, daniel.soudry}@gmail.com {ron.banner}@intel.com (1) Intel Artiﬁcial Intelligence Products Group (AIPG) (2) Technion Israel Institute of Technology, Haifa, Israel
Pseudocode	No	The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	The code to replicate all of our experiments is available on-line 2. https://github.com/eladhoffer/quantized.pytorch
Open Datasets	Yes	To the best of the authors knowledge, this work is the ﬁrst to quantize the weights, activations, as well as a substantial volume of the gradients stream, in all layers (including batch normalization) to 8-bit while showing state-of-the-art results over the Image Net-1K dataset. Experiments on Image Net with Res18 and Res50 showed no distinguishable difference between accuracy of Range BN and traditional BN. Experiment results on cifar-10 dataset.
Dataset Splits	No	The paper mentions 'validation accuracy' and evaluates on datasets like ImageNet and CIFAR-10, but it does not specify the explicit percentages or sample counts for the training, validation, or test splits. It implicitly relies on standard splits for these public datasets without explicitly stating them.
Hardware Specification	Yes	A Titan Xp used for this research was donated by the NVIDIA Corporation.
Software Dependencies	No	The paper mentions using 'GEMMLOWP quantization scheme as decribed in Google s open source library [1]' but does not provide specific version numbers for GEMMLOWP or any other software dependencies like Python, PyTorch, etc.
Experiment Setup	Yes	To validate this low precision scheme, we were quantizing the vast majority of operations to 8-bit. The only operations left at higher precising were the updates (ﬂoat32) needed to accumulate small changes from stochastic gradient descent, and a copy of the layer gradients at 16 bits needed to compute g W . The ﬂoat32 updates are done once per minibatch while the propagations are done for each example (e.g., for a minibatch of 256 examples the updates constitute less than 0.4% of the training effort).