QBB: Quantization with Binary Bases for LLMs

Authors: Adrian Bulat, Yassine Ouali, Georgios Tzimiropoulos

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental When evaluated across multiple LLM families, our approach matches and outperforms all prior works, setting a new state-of-the-art result using a summation-only based approach.
Researcher Affiliation Collaboration Adrian Bulat1,2 Yassine Ouali1 Georgios Tzimiropoulos1,3 1Samsung AI Cambridge 2Technical University of Iasi 3Queen Mary University of London
Pseudocode No The paper illustrates processes with figures (Fig. 1, Fig. 2) but does not contain structured pseudocode or algorithm blocks.
Open Source Code No No code was included with the paper at submission time.
Open Datasets Yes We compare our approach with the current state-of-the-art for low-bit quantization in terms of perplexity score on the main benchmark for quantization Wiki Text2 [41], focusing mainly on the LLa MA-2 [53] {7, 13, 70}B family of models. However, we also include results for LLa MA [52] {7, 13, 30, 65}B and Phi-2 [23] 2.7B models.
Dataset Splits No The paper mentions using Wiki Text2 for evaluation but does not explicitly provide the specific training, validation, and test dataset splits used for its experiments, nor does it cite a source that defines the exact splits within the paper itself.
Hardware Specification Yes During the input-agnostic quantization part, presented in Sec. 3.1 and using a single A100 GPU, we optimize each set of binary matrices and scaling vectors, layer by layer...
Software Dependencies No The paper states 'We implement our method using Py Torch [43]' but does not provide a specific version number for PyTorch or other software dependencies.
Experiment Setup Yes During the input-agnostic quantization part... using the following hyperparameters: Adam optimizer [28], 15000 iterations, no weight decay, an initial learning rate of 1e 4 decayed to 0 using a cosine scheduler. For the data-free distillation step... we fine-tune the scaling vectors only for 2 epochs using an Adam optimizer, a cosine learning rate scheduler, no weights decay, and an initial learning rate set to 2.5e 4. For added stability, we clip the gradients with a norm higher than 1.