Accurate Post Training Quantization With Small Calibration Sets
Authors: Itay Hubara, Yury Nahshan, Yair Hanani, Ron Banner, Daniel Soudry
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically demonstrate that this approach is: (1) much less susceptible to overfitting than the standard fine-tuning approaches, and can be used even on a very small calibration set; and (2) more powerful than previous methods, which only set the activations dynamic ranges. ... Together, these methods yield state-of-the-art results for both vision and text models. |
| Researcher Affiliation | Collaboration | 1Habana Labs An Intel company, Caesarea, Israel 2Department of Electrical Engineering Technion, Haifa, Israel. |
| Pseudocode | No | The paper describes methods using mathematical formulations and descriptive text but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | We open-sourced our code https://github.com/papers-submission/CalibTIP. |
| Open Datasets | Yes | In this section, we demonstrate our methods and pipelines on several models and datasets. We first start by analyzing image recognition models such as Res Net18/50, Mobile Net V2, which were trained over the Image Net dataset. Next, we demonstrate our method robustness by applying it on question answering task using the popular BERT model (Devlin et al., 2018), which was fine-tuned on the SQu AD1.1 dataset (Rajpurkar et al., 2016). |
| Dataset Splits | Yes | In all our experiments, we used a small calibration set taken from the training dataset. ... For each method, we measured the top-1 accuracy with respect to the number of samples in the calibration set over five runs and present the mean and standard deviation. ... We use 1,000 samples from the training set as our calibration set. ... Throughout our experiments, we avoided using any augmentation technique and follow the standard (He et al., 2016) validation set prepossessing. |
| Hardware Specification | Yes | As an example, for Res Net50, even the most time-consuming version, seq-Ada Quant, takes less than 5 minutes on one device (Ge Force 1080). |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies, libraries, or frameworks used in the experiments. |
| Experiment Setup | Yes | In all our experiments, we used a small calibration set taken from the training dataset. Unless stated otherwise, we applied asymmetric per-channel quantization (i.e. GEMLOWP (Wu et al., 2016)) with quantized offset (i.e., zero point). ... In all experiments, we used 1000 samples from the training set as our calibration set. Our setting considers only a mixture of 8-bit and 4bit layers |