reproducibilityindex.ai

QuIP$#$: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks

Authors: Albert Tseng, Jerry Chee, Qingyao Sun, Volodymyr Kuleshov, Christopher De Sa

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show that Qu IP# outperforms existing PTQ methods, enables new behaviors in PTQ scaling, and supports fast inference.
Researcher Affiliation	Academia	1Department of Computer Science, Cornell University 2Department of Operations Research and Information Engineering, Cornell University.
Pseudocode	Yes	Algorithm 1 Qu IP# without Fine-Tuning (Qu IP#-No FT) input Weight W Rm n, hessians H Rn n, g-dim. k-bit codebook C ... Algorithm 2 Qu IP# Inference (for a Linear Layer) ... Algorithm 3 Incoherence Processing with RHT (IP-RHT) ... Algorithm 4 Incoherence Processing with RFFT (IP-RFFT) ... Algorithm 5 Qu IP# with Fine-Tuning
Open Source Code	Yes	Our code can be found at https://github.com/ Cornell-Relax ML/quip-sharp.
Open Datasets	Yes	Hessian matrices H were generated with 6144 sequences of a model s native context length (2048 for Llama 1, 4096 for Llama 2) from the Red Pajama 1T (Computer, 2023) dataset.
Dataset Splits	Yes	We train on small development dataset of 256 sequences from Red Pajama 1T and validate on 128 sequences.
Hardware Specification	Yes	All experiments were run on NVIDIA A100 GPUs except for the timing numbers, which were measured on a NVIDIA RTX 4090
Software Dependencies	No	The paper mentions software components like "Flash Attention library", "Hugging Face library", and "CUDA kernel" but does not specify their version numbers for reproducibility.
Experiment Setup	Yes	For the within-transformer block section of fine-tuning, we use the Adam optimizer (Kingma & Ba, 2017), a learning rate of 5 10 5, batch size of 8, and sequence length equal to the model s native context length. We train on small development dataset of 256 sequences from Red Pajama 1T and validate on 128 sequences. We train for 5 epochs (160 steps) and keep the best model parameters based on the validation set.