LQER: Low-Rank Quantization Error Reconstruction for LLMs
Authors: Cheng Zhang, Jianyi Cheng, George Anthony Constantinides, Yiren Zhao
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4. Experiments |
| Researcher Affiliation | Academia | 1Department of Electrical and Electronic Engineering, Imperial College London, London, United Kingdom 2Department of Computer Science and Technology, University of Cambridge, Cambridge, United Kingdom. |
| Pseudocode | No | The paper describes its methods using prose and mathematical equations but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | We open-sourced our framework at github.com/Cheng Zhang-98/lqer. |
| Open Datasets | Yes | We report the perplexity on Wiki Text-2 (Merity et al., 2016) and the accuracy on ARC (easy) (Clark et al., 2018), ARC (challenge) (Clark et al., 2018), LAMBADA (Paperno et al., 2016), PIQA (Bisk et al., 2020), Open Book QA (Mihaylov et al., 2018), and Bool Q (Clark et al., 2019) using the lm-eval-harness evaluation flow (Gao et al., 2023)... We create a subset of Slim Pajama (Soboleva et al., 2023) with Wikipedia texts excluded as the calibration dataset. |
| Dataset Splits | No | The paper mentions using datasets for evaluation and a calibration dataset, but it does not explicitly specify the training, validation, and test dataset splits with percentages or sample counts. |
| Hardware Specification | Yes | The calibration and quantiation of LLa MA-33B takes around 1.2 hours in total on a single NVIDIA A100 GPU. |
| Software Dependencies | No | The paper mentions 'lm-eval-harness evaluation flow' but does not specify its version or any other software dependencies with explicit version numbers. |
| Experiment Setup | Yes | We use MXINT as the number format of LQER if not specified. In Section 4.3, we use W4A8 L2QER with k = 32 to compare with both 4-bit w-only and 4-/6/8-bit w&a quantization methods. In Section 4.4, we use W2A8 L2QER with k = 256 to compare with 2-bit w-only quantization methods... The block size of MXINT is the default [1, 16] in the original paper (Darvish Rouhani et al., 2020) for Xq ([16, 1] for Wq, Ak, and Bk). |