QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models

Authors: Yuhui Xu, Lingxi Xie, Xiaotao Gu, Xin Chen, Heng Chang, Hengheng Zhang, Zhengsu Chen, XIAOPENG ZHANG, Qi Tian

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate QA-Lo RA on the LLa MA and LLAMA2 model families (Touvron et al., 2023a;b) and validate it on various language understanding benchmarks. Figure 1 shows the comparison of 5-shot accuracy on the MMLU benchmark between QA-Lo RA and the direct baseline, QLo RA (Dettmers et al., 2023a). QA-Lo RA consistently outperforms QLo RA with PTQ on top of LLMs of different scales (the advantage becomes more significant when the quantization bit width is lower) and is on par with QLo RA without PTQ.
Researcher Affiliation Industry Huawei Inc.
Pseudocode Yes Algorithm 1 QA-Lo RA Pseudocode in the Py Torch-like style
Open Source Code Yes The code is made available at https://github.com/yuhuixu1993/qa-lora.
Open Datasets Yes We choose Alpaca (Taori et al., 2023) and FLAN v2 (Longpre et al., 2023) as our fine-tuning datasets.
Dataset Splits No The paper states that Alpaca and FLAN v2 are used as fine-tuning datasets and MMLU for evaluation, but it does not explicitly describe the training/validation/test splits used for these datasets to reproduce the data partitioning.
Hardware Specification Yes All experiments are conducted on Tesla V100 GPUs. We use one GPU for the 7B, 13B, and 33B models and two GPUs for the 65B models.
Software Dependencies No The paper mentions software components like PyTorch (implicitly via "Py Torch-like style" pseudocode), CUDA, GPTQ, and lm-evalharness, but it does not specify their version numbers.
Experiment Setup Yes Following QLo RA (Dettmers et al., 2023a), we use a paged Adam W optimizer, a maximum gradient norm of 0.3, and a batch size of 16 in the tuning period. We choose the constant learning rate schedule and set the learning rate to be 2 10 5 for the 7B and 13B models and 1 10 5 for the 33B and 65B models. The number of fine-tuning steps is 10K for Alpaca and 20K for FLAN v2.