reproducibilityindex.ai

Q-VLM: Post-training Quantization for Large Vision-Language Models

Authors: Changyuan Wang, Ziwei Wang, Xiuwei Xu, Yansong Tang, Jie Zhou, Jiwen Lu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results demonstrate that our method compresses the memory by 2.78x and increase generate speed by 1.44x about 13B LLa VA model without performance degradation on diverse multi-modal reasoning tasks.
Researcher Affiliation	Academia	1Shenzhen International Graduate School, Tsinghua University, China 2Department of Automation, Tsinghua University, China 3School of Electrical and Electronic Engineering, Nanyang Technological University
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks clearly labeled as such.
Open Source Code	Yes	Code is available at https://github.com/Changyuan Wang17/QVLM
Open Datasets	Yes	We utilize the large vision-language frameworks for post-training quantization including LLa VA [31] and Mo E-LLa VA [28] with their pre-trained weights for multi-modal question answering tasks. ... The multi-modal answer reasoning dataset is Science QA [35], which contains 21k vision-language multiple choice questions. We also contain Viz Wiz [15] and VQA-v2 [14] datasets.
Dataset Splits	Yes	For the parameter learning in LVLM quantization, we randomly select 64 vision-language pairs from the datasets for hyper-network learning where the batchsize was assigned with 8 for calibration set construction.
Hardware Specification	No	The paper does not provide specific hardware details (GPU/CPU models, processor types, or memory amounts) used for running its experiments within the main text.
Software Dependencies	No	The paper mentions frameworks and methods used (e.g., LLa VA, Mo E-LLa VA, QLo RA, AWQ) but does not provide specific version numbers for any software components or libraries.
Experiment Setup	Yes	We set the bitwidth of quantized weight and activation to 6 and 4 to evaluate our method in different qualityefficiency trade-offs uniform quantization scheme where the interval between adjacent rounding points was equal. ... We set the maximum layer depth to 3 within a block... In the LVLM quantization exploration, we adjust hyperparameters p of percentile ranging from 1.0 to 0.98 with 0.005 interval... we modified the hyperparameter η... For the parameter learning in LVLM quantization, we randomly select 64 vision-language pairs from the datasets for hyper-network learning where the batchsize was assigned with 8 for calibration set construction. The quantization function parameters were updated for 10 epochs in searching process...