reproducibilityindex.ai

I-BERT: Integer-only BERT Quantization

Authors: Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our approach on GLUE downstream tasks using RoBERTa Base/Large. We show that for both cases, I-BERT achieves similar (and slightly higher) accuracy as compared to the full-precision baseline. Furthermore, our preliminary implementation of I-BERT shows a speedup of 2.4-4.0 for INT8 inference on a T4 GPU system as compared to FP32 inference.
Researcher Affiliation	Academia	University of California, Berkeley.
Pseudocode	Yes	Algorithm 1 Integer-only Computation of Second-order Polynomial a(x + b)2 + c, Algorithm 2 Integer-only GELU, Algorithm 3 Integer-only Exponential and Softmax, Algorithm 4 Integer-only Square Root
Open Source Code	Yes	The framework has been developed in Py Torch and has been open-sourced (Kim, 2021).
Open Datasets	Yes	We evaluate our approach on GLUE downstream tasks using RoBERTa Base/Large.
Dataset Splits	Yes	For each of the GLUE downstream tasks, we train both FP32 baseline and integer-only I-BERT models, and evaluate the accuracy on the development set.
Hardware Specification	Yes	Furthermore, our preliminary implementation of I-BERT shows a speedup of 2.4-4.0 for INT8 inference on a T4 GPU system as compared to FP32 inference.
Software Dependencies	No	The framework has been developed in Py Torch and has been open-sourced (Kim, 2021). Specific version numbers for PyTorch and TensorRT are not provided.
Experiment Setup	Yes	See Appendix C.2 and C.3 for training and evaluation details.