reproducibilityindex.ai

FrameQuant: Flexible Low-Bit Quantization for Transformers

Authors: Harshavardhan Adepu, Zhanpeng Zeng, Li Zhang, Vikas Singh

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We performed an extensive set of experiments comparing Frame Quant with several quantization baselines for Vision models and Language models. The goal is to assess (a) performance metrics of different methods on benchmark tasks and (b) how close low-bit quantization can approach the full precision performance with a small degree of representation redundancy. We use image classification task (Deng et al., 2009) for Vision models and Perplexity for Language models.
Researcher Affiliation	Collaboration	1University of Wisconsin-Madison 2Google Research. Correspondence to: Harshavardhan Adepu <adepu@wisc.edu>
Pseudocode	Yes	Algorithm 1 Frame Quant Require: Weight matrix Θl, previous layer activations Aprev, input and output Fusion Frames Pl, Pprev, block size B 1: Compute Cprev = P T prev Aprev, Dl = P T l Θl Pprev 2: Compute σ = std(Dl), µ = mean(Dl) 3: Dl = 2σ clip(Dl, µ 2σ, µ + 2σ) 4: ˆDl = quantize(Dl, Cprev, B) // modified GPTQ 5: Store ˆDl // store the quantized matrix ˆDl return Pl ˆDl Cprev // return quantized layer activations
Open Source Code	Yes	The code is available at https://github.com/ vsingh-group/Frame Quant
Open Datasets	Yes	We evaluate our method on the Image Net-1K classification task.
Dataset Splits	Yes	Finally, we evaluate the quantized models on the Image Net-1K validation dataset and report the top-1 accuracy.
Hardware Specification	Yes	Table 7 shows the inference speeds of the quantized models on a Nvidia A100 GPU.
Software Dependencies	No	The paper mentions using Huggingface hub, but does not list specific software dependencies (e.g., Python, PyTorch, or other libraries) with version numbers.
Experiment Setup	Yes	For quantizing the model weights of the pre-trained models obtained from the Huggingface hub (Wightman, 2019), we use 128 images randomly selected images from the training dataset as calibration dataset D. We quantize the parameter matrices of the layers sequentially from shallow layers to deep layers, similar to (Frantar et al., 2023).