OneBit: Towards Extremely Low-bit Large Language Models
Authors: Yuzhuang Xu, Xu Han, Zonghan Yang, Shuo Wang, Qingfu Zhu, Zhiyuan Liu, Weidong Liu, Wanxiang Che
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Sufficient experimental results indicate that One Bit achieves good performance (at least 81% of the non-quantized performance on LLa MA models) with robust training processes when only using 1-bit weight matrices. |
| Researcher Affiliation | Academia | Yuzhuang Xu1 Xu Han2 Zonghan Yang2 Shuo Wang2 Qingfu Zhu1 Zhiyuan Liu2 Weidong Liu2 Wanxiang Che1,B 1Research Center for Social Computing and Information Retrieval, Harbin Institute of Technology, Harbin, China 2Department of Computer Science & Technology, Tsinghua University, Beijing, China |
| Pseudocode | No | No pseudocode or algorithm blocks were found. |
| Open Source Code | Yes | Code and checkpoints are available at https://github.com/xuyuzhuang11/One Bit |
| Open Datasets | Yes | We evaluate our approach by performing experiments on OPT-1.3B/2.7B models, LLa MA7B/13B models and LLa MA2-7B/13B models, and present results on various tasks. ... Specifically, on Wiki Text2 [23] and C4 [28]. |
| Dataset Splits | Yes | Basically, we evaluate quantized models by testing the perplexity on the validation set, specifically on Wiki Text2 [23] and C4 [28]. |
| Hardware Specification | Yes | We use NVIDIA A100 GPUs and maintain FP16 precision while training quantized models. |
| Software Dependencies | No | We employ NMF in scikit-learn 1 to decompose the weight matrices in SVID. |
| Experiment Setup | Yes | Every KD experiment learns the training data over 50 epochs, from which 2048token segments are selected. We employ NMF in scikit-learn to decompose the weight matrices in SVID. The quantized student models are optimized by Adam [19] with β1 = 0.9, β2 = 0.98. The learning rate for all experiments is scheduled by cosine strategy. We use NVIDIA A100 GPUs and maintain FP16 precision while training quantized models. For additional details such as learning rate, please refer to Table 1. |