reproducibilityindex.ai

QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models

Authors: Yuhui Xu, Lingxi Xie, Xiaotao Gu, Xin Chen, Heng Chang, Hengheng Zhang, Zhengsu Chen, XIAOPENG ZHANG, Qi Tian

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate QA-Lo RA on the LLa MA and LLAMA2 model families (Touvron et al., 2023a;b) and validate it on various language understanding benchmarks. Figure 1 shows the comparison of 5-shot accuracy on the MMLU benchmark between QA-Lo RA and the direct baseline, QLo RA (Dettmers et al., 2023a). QA-Lo RA consistently outperforms QLo RA with PTQ on top of LLMs of different scales (the advantage becomes more significant when the quantization bit width is lower) and is on par with QLo RA without PTQ.
Researcher Affiliation	Industry	Huawei Inc.
Pseudocode	Yes	Algorithm 1 QA-Lo RA Pseudocode in the Py Torch-like style
Open Source Code	Yes	The code is made available at https://github.com/yuhuixu1993/qa-lora.
Open Datasets	Yes	We choose Alpaca (Taori et al., 2023) and FLAN v2 (Longpre et al., 2023) as our fine-tuning datasets.
Dataset Splits	No	The paper states that Alpaca and FLAN v2 are used as fine-tuning datasets and MMLU for evaluation, but it does not explicitly describe the training/validation/test splits used for these datasets to reproduce the data partitioning.
Hardware Specification	Yes	All experiments are conducted on Tesla V100 GPUs. We use one GPU for the 7B, 13B, and 33B models and two GPUs for the 65B models.
Software Dependencies	No	The paper mentions software components like PyTorch (implicitly via "Py Torch-like style" pseudocode), CUDA, GPTQ, and lm-evalharness, but it does not specify their version numbers.
Experiment Setup	Yes	Following QLo RA (Dettmers et al., 2023a), we use a paged Adam W optimizer, a maximum gradient norm of 0.3, and a batch size of 16 in the tuning period. We choose the constant learning rate schedule and set the learning rate to be 2 10 5 for the 7B and 13B models and 1 10 5 for the 33B and 65B models. The number of fine-tuning steps is 10K for Alpaca and 20K for FLAN v2.