Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT
Authors: Sheng Shen, Zhen Dong, Jiayu Ye, Linjian Ma, Zhewei Yao, Amir Gholami, Michael W. Mahoney, Kurt Keutzer8815-8821
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We extensively test our proposed method on BERT downstream tasks of SST-2, MNLI, Co NLL-03, and SQu AD. We can achieve comparable performance to baseline with at most 2.3% performance degradation, even with ultra-low precision quantization down to 2 bits |
| Researcher Affiliation | Collaboration | 1University of California at Berkeley, {sheng.s, zhendong, yejiayu, linjian, zheweiy, amirgh, mahoneymw, keutzer}@berkeley.edu. Equal contribution. Work done while interning at Wave Computing. |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | Yes | We extensively test our proposed method on BERT downstream tasks of SST-2, MNLI, Co NLL-03, and SQu AD. Details of the datasets are shown in Appendix. (These are well-known benchmark datasets.) |
| Dataset Splits | No | The paper mentions evaluating on the "development set" and using "10% of the entire training dataset" for Hessian calculation, but it does not specify the train/validation/test splits (e.g., percentages or exact counts) for the main experiment evaluation. |
| Hardware Specification | No | The paper does not provide specific hardware details used for running its experiments. It mentions "academic computational resources" but no specific models or specifications. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers. |
| Experiment Setup | Yes | To set mixed precision to each encoder layer of BERTBASE, we measure the sensitivity based on Eq. 2... We then perform quantization-aware finetuning based on the selected precision setting. All experiments in Fig. 1 are based on 10 runs and each run uses 10% of the entire training dataset. ...all the models except for Baseline are using 8-bits activation. ...we used 128 groups for both Q-BERT and Q-BERTMP in Sec. 3.1. |