Training Deep Neural Networks with 8-bit Floating Point Numbers

Authors: Naigang Wang, Jungwook Choi, Daniel Brand, Chia-Yu Chen, Kailash Gopalakrishnan

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Here we demonstrate, for the first time, the successful training of DNNs using 8-bit floating point numbers while fully maintaining the accuracy on a spectrum of Deep Learning models and datasets. The experimental results are summarized in Table 1, while the detailed convergence curves are shown in Fig. 4.
Researcher Affiliation Industry Naigang Wang, Jungwook Choi, Daniel Brand, Chia-Yu Chen and Kailash Gopalakrishnan IBM T. J. Watson Research Center Yorktown Heights, NY 10598, USA {nwang, choij, danbrand, cchen, kailash}@us.ibm.com
Pseudocode Yes Figure 3: (a) Reduced-precision dot-product based on accumulation in chunks. Input: {"#"}#%&:(,{)#}#%&:((*+,-./), Parameter: chunk size 01 Output: 234 (*+566) 234 = 0.0; <=" = 0; >346? = @/01 for n=1:>346? { 2346? = 0.0 for i=1:01 { <="++ B4C = "DEF G )DEF (in *+,-./) 2346? += B4C (in *+566)} 234 += 2346? (in *+566)}
Open Source Code No The software platform is an in-house distributed deep learning framework [7]. No explicit statement or link for open-source code was found.
Open Datasets Yes To demonstrate the robustness as well as the wide coverage of the proposed FP8 training scheme, we tested it comprehensively on a spectrum of well-known Convolutional Neural Networks (CNNs) and Deep Neural Networks (DNNs) for both image and speech classification tasks across multiple datasets; CIFAR10-CNN ([14]), CIFAR10-Res Net, Image Net-Res Net18, Image Net-Res Net50 ([9]), Image Net-Alex Net ([15]), BN50-DNN ([18]) (details on the network architectures can be found in the supplementary material).
Dataset Splits No The paper mentions training on datasets like CIFAR10 and ImageNet and reports test errors, but does not explicitly provide specific train/validation/test split percentages or sample counts within the text for reproduction beyond the common knowledge of these datasets having standard splits. Minibatch Size 128 Epoch 140 are mentioned in Table 1 for CIFAR10-CNN.
Hardware Specification No Reduced-precision emulated experiments were performed using NVIDIA GPUs. No specific GPU/CPU models or detailed hardware specifications were provided beyond this general statement.
Software Dependencies No The software platform is an in-house distributed deep learning framework [7]. No specific software dependencies with version numbers were found.
Experiment Setup Yes The three GEMM computations share the same bit-precision and chunk-size: FP8 for input operands and multiplication and FP16 for accumulation with a chunk-size of 64. The three AXPY computations use the same bit-precision, FP16, using floating point stochastic rounding. To preserve the dynamic range of the back-propagated error with small magnitude, we adopt the loss-scaling method described in [16]. For all the models tested, a single scaling factor of 1000 was used without loss of accuracy. The GEMM computation for the last layer of the model (typically a small FC layer followed by Softmax) is kept at FP16 for better numerical stability. Finally, for the Image Net dataset, the input image is represented using FP16 for the Res Net18 and Res Net50 models. All networks are trained using the SGD optimizer via the proposed FP8 training scheme without changes to network architectures, data pre-processing, or hyper-parameters. Table 1 also lists 'Minibatch Size 128' and 'Epoch 140' for CIFAR10-CNN as examples.