reproducibilityindex.ai

LQER: Low-Rank Quantization Error Reconstruction for LLMs

Authors: Cheng Zhang, Jianyi Cheng, George Anthony Constantinides, Yiren Zhao

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	4. Experiments
Researcher Affiliation	Academia	1Department of Electrical and Electronic Engineering, Imperial College London, London, United Kingdom 2Department of Computer Science and Technology, University of Cambridge, Cambridge, United Kingdom.
Pseudocode	No	The paper describes its methods using prose and mathematical equations but does not include structured pseudocode or algorithm blocks.
Open Source Code	Yes	We open-sourced our framework at github.com/Cheng Zhang-98/lqer.
Open Datasets	Yes	We report the perplexity on Wiki Text-2 (Merity et al., 2016) and the accuracy on ARC (easy) (Clark et al., 2018), ARC (challenge) (Clark et al., 2018), LAMBADA (Paperno et al., 2016), PIQA (Bisk et al., 2020), Open Book QA (Mihaylov et al., 2018), and Bool Q (Clark et al., 2019) using the lm-eval-harness evaluation flow (Gao et al., 2023)... We create a subset of Slim Pajama (Soboleva et al., 2023) with Wikipedia texts excluded as the calibration dataset.
Dataset Splits	No	The paper mentions using datasets for evaluation and a calibration dataset, but it does not explicitly specify the training, validation, and test dataset splits with percentages or sample counts.
Hardware Specification	Yes	The calibration and quantiation of LLa MA-33B takes around 1.2 hours in total on a single NVIDIA A100 GPU.
Software Dependencies	No	The paper mentions 'lm-eval-harness evaluation flow' but does not specify its version or any other software dependencies with explicit version numbers.
Experiment Setup	Yes	We use MXINT as the number format of LQER if not specified. In Section 4.3, we use W4A8 L2QER with k = 32 to compare with both 4-bit w-only and 4-/6/8-bit w&a quantization methods. In Section 4.4, we use W2A8 L2QER with k = 256 to compare with 2-bit w-only quantization methods... The block size of MXINT is the default [1, 16] in the original paper (Darvish Rouhani et al., 2020) for Xq ([16, 1] for Wq, Ak, and Bk).