BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
Authors: Wei Huang, Yangdong Liu, Haotong Qin, Ying Li, Shiming Zhang, Xianglong Liu, Michele Magno, Xiaojuan Qi
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that Bi LLM achieve the state-of-the-art (SOTA) performance for LLMs across multiple LLM families on various evaluation metrics, and first achieves extremely compact 1.07 1.11 bit-width in average for the PTQ binarization. |
| Researcher Affiliation | Academia | 1The University of Hong Kong 2Beihang University 3ETH Z urich. |
| Pseudocode | Yes | Algorithm 1 illustrates the complete process of Bi LLM, and detailed implementation of Bi LLM is shown in Appendix A. Algorithm 2 Bi LLM: Detailed functions process |
| Open Source Code | Yes | Our code is available at https://github.com/Aaronhuang-778/Bi LLM. |
| Open Datasets | Yes | We consider the test of Wiki Text2 (Merity et al., 2016), PTB (Marcus et al., 1994), as well as a part of the C4 (Raffel et al., 2020) data. |
| Dataset Splits | No | The paper does not specify training, validation, and test dataset splits (e.g., 80/10/10 split or specific sample counts for each split). |
| Hardware Specification | Yes | All the binarization processes and experiments are conducted on a single 80 GB NVIDIA A100. |
| Software Dependencies | No | We deploy Bi LLM within the Pytorch (Paszke et al., 2019)Huggingface libraries (Wolf et al., 2019). |
| Experiment Setup | Yes | We deploy the Bi LLM on the OPT models (Zhang et al., 2022) under the condition of a block size equal to 128. Algorithm 1 func Binary LLM(W, X, β, λ) Input: W Rn m weight matrix X Rr d calibration data β block size λ hessian regularizer Output: B binarized weights |