FP8 Quantization: The Power of the Exponent
Authors: Andrey Kuzmin, Mart van Baalen, Yuwei Ren, Markus Nagel, Jorn Peters, Tijmen Blankevoort
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We investigate the effect of the quantization formats on neural network quantization on three levels: 1) Analytically for several common data and weight distributions, 2) practically in INT8 and FP8 post-training quantization (PTQ) settings, and 3) in quantization-aware training (QAT) settings with both INT8 and different FP8 formats. We will show there is a strong agreement between our theoretical results and our practical results on real networks. |
| Researcher Affiliation | Industry | Qualcomm AI Research {akuzmin,mart,ren,markusn,jpeters,tijmen}@qti.qualcomm.com |
| Pseudocode | No | No pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | No | 1Code will be made available at https://github.com/Qualcomm-AI-research/FP8-quantization |
| Open Datasets | Yes | We experiment on Res Net18 [19], Mobile Net V2 [38], and Vi T [14] for Image Net classification [37]; BERT-base [12] for language understanding on the GLUE benchmark [43]; HRNet [39] for semantic segmentation on the Cityscapes dataset [10]; Deep Lab V3 [7] for semantic segmentation on the Pascal VOC dataset [16]; and Salsa Next [11] for LIDAR point cloud segmentation on the Semantic KITTI dataset [2]. |
| Dataset Splits | Yes | Following [35] we do not apply batch normalization folding, and re-estimate the batch normalization statistics (running mean and variance) before final validation, as this improved results for every model we considered. |
| Hardware Specification | Yes | Our code is written in Py Torch and all our experiments are performed using NVIDIA Tesla V100 and A100 GPUs. |
| Software Dependencies | No | The paper states "Our code is written in Py Torch" but does not specify a version number or other software dependencies with versions. |
| Experiment Setup | Yes | We train our models for 20 epochs and use Adam for the model parameters and SGD for the quantization parameters. We run experiments with various learning rates for model and quantization parameters, as well as per-tensor and per-channel quantization, and report results for the best learning setup. |