LQER: Low-Rank Quantization Error Reconstruction for LLMs

Authors: Cheng Zhang, Jianyi Cheng, George Anthony Constantinides, Yiren Zhao

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4. Experiments
Researcher Affiliation Academia 1Department of Electrical and Electronic Engineering, Imperial College London, London, United Kingdom 2Department of Computer Science and Technology, University of Cambridge, Cambridge, United Kingdom.
Pseudocode No The paper describes its methods using prose and mathematical equations but does not include structured pseudocode or algorithm blocks.
Open Source Code Yes We open-sourced our framework at github.com/Cheng Zhang-98/lqer.
Open Datasets Yes We report the perplexity on Wiki Text-2 (Merity et al., 2016) and the accuracy on ARC (easy) (Clark et al., 2018), ARC (challenge) (Clark et al., 2018), LAMBADA (Paperno et al., 2016), PIQA (Bisk et al., 2020), Open Book QA (Mihaylov et al., 2018), and Bool Q (Clark et al., 2019) using the lm-eval-harness evaluation flow (Gao et al., 2023)... We create a subset of Slim Pajama (Soboleva et al., 2023) with Wikipedia texts excluded as the calibration dataset.
Dataset Splits No The paper mentions using datasets for evaluation and a calibration dataset, but it does not explicitly specify the training, validation, and test dataset splits with percentages or sample counts.
Hardware Specification Yes The calibration and quantiation of LLa MA-33B takes around 1.2 hours in total on a single NVIDIA A100 GPU.
Software Dependencies No The paper mentions 'lm-eval-harness evaluation flow' but does not specify its version or any other software dependencies with explicit version numbers.
Experiment Setup Yes We use MXINT as the number format of LQER if not specified. In Section 4.3, we use W4A8 L2QER with k = 32 to compare with both 4-bit w-only and 4-/6/8-bit w&a quantization methods. In Section 4.4, we use W2A8 L2QER with k = 256 to compare with 2-bit w-only quantization methods... The block size of MXINT is the default [1, 16] in the original paper (Darvish Rouhani et al., 2020) for Xq ([16, 1] for Wq, Ak, and Bk).