reproducibilityindex.ai

BiLLM: Pushing the Limit of Post-Training Quantization for LLMs

Authors: Wei Huang, Yangdong Liu, Haotong Qin, Ying Li, Shiming Zhang, Xianglong Liu, Michele Magno, Xiaojuan Qi

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that Bi LLM achieve the state-of-the-art (SOTA) performance for LLMs across multiple LLM families on various evaluation metrics, and first achieves extremely compact 1.07 1.11 bit-width in average for the PTQ binarization.
Researcher Affiliation	Academia	1The University of Hong Kong 2Beihang University 3ETH Z urich.
Pseudocode	Yes	Algorithm 1 illustrates the complete process of Bi LLM, and detailed implementation of Bi LLM is shown in Appendix A. Algorithm 2 Bi LLM: Detailed functions process
Open Source Code	Yes	Our code is available at https://github.com/Aaronhuang-778/Bi LLM.
Open Datasets	Yes	We consider the test of Wiki Text2 (Merity et al., 2016), PTB (Marcus et al., 1994), as well as a part of the C4 (Raffel et al., 2020) data.
Dataset Splits	No	The paper does not specify training, validation, and test dataset splits (e.g., 80/10/10 split or specific sample counts for each split).
Hardware Specification	Yes	All the binarization processes and experiments are conducted on a single 80 GB NVIDIA A100.
Software Dependencies	No	We deploy Bi LLM within the Pytorch (Paszke et al., 2019)Huggingface libraries (Wolf et al., 2019).
Experiment Setup	Yes	We deploy the Bi LLM on the OPT models (Zhang et al., 2022) under the condition of a block size equal to 128. Algorithm 1 func Binary LLM(W, X, β, λ) Input: W Rn m weight matrix X Rr d calibration data β block size λ hessian regularizer Output: B binarized weights