reproducibilityindex.ai

OneBit: Towards Extremely Low-bit Large Language Models

Authors: Yuzhuang Xu, Xu Han, Zonghan Yang, Shuo Wang, Qingfu Zhu, Zhiyuan Liu, Weidong Liu, Wanxiang Che

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Sufficient experimental results indicate that One Bit achieves good performance (at least 81% of the non-quantized performance on LLa MA models) with robust training processes when only using 1-bit weight matrices.
Researcher Affiliation	Academia	Yuzhuang Xu1 Xu Han2 Zonghan Yang2 Shuo Wang2 Qingfu Zhu1 Zhiyuan Liu2 Weidong Liu2 Wanxiang Che1,B 1Research Center for Social Computing and Information Retrieval, Harbin Institute of Technology, Harbin, China 2Department of Computer Science & Technology, Tsinghua University, Beijing, China
Pseudocode	No	No pseudocode or algorithm blocks were found.
Open Source Code	Yes	Code and checkpoints are available at https://github.com/xuyuzhuang11/One Bit
Open Datasets	Yes	We evaluate our approach by performing experiments on OPT-1.3B/2.7B models, LLa MA7B/13B models and LLa MA2-7B/13B models, and present results on various tasks. ... Specifically, on Wiki Text2 [23] and C4 [28].
Dataset Splits	Yes	Basically, we evaluate quantized models by testing the perplexity on the validation set, specifically on Wiki Text2 [23] and C4 [28].
Hardware Specification	Yes	We use NVIDIA A100 GPUs and maintain FP16 precision while training quantized models.
Software Dependencies	No	We employ NMF in scikit-learn 1 to decompose the weight matrices in SVID.
Experiment Setup	Yes	Every KD experiment learns the training data over 50 epochs, from which 2048token segments are selected. We employ NMF in scikit-learn to decompose the weight matrices in SVID. The quantized student models are optimized by Adam [19] with β1 = 0.9, β2 = 0.98. The learning rate for all experiments is scheduled by cosine strategy. We use NVIDIA A100 GPUs and maintain FP16 precision while training quantized models. For additional details such as learning rate, please refer to Table 1.