reproducibilityindex.ai

Accurate LoRA-Finetuning Quantization of LLMs via Information Retention

Authors: Haotong Qin, Xudong Ma, Xingyu Zheng, Xiaoyang Li, Yang Zhang, Shouda Liu, Jie Luo, Xianglong Liu, Michele Magno

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Comprehensive experiments show that IR-QLo RA can significantly improve accuracy across LLa MA and LLa MA2 families under 2-4 bit-widths, e.g., 4bit LLa MA-7B achieves 1.4% improvement on MMLU compared with the state-of-the-art methods.
Researcher Affiliation	Collaboration	1ETH Z urich 2Beihang University 3Bytedance AI Lab.
Pseudocode	Yes	Algorithm 1 The weight search process within each block in IR-QLo RA
Open Source Code	Yes	The code is available at https://github.com/htqin/ir-qlora.
Open Datasets	Yes	Our IR-QLo RA is established upon the LLa MA (Touvron et al., 2023a) and LLa MA2 (Touvron et al., 2023b) families... and constructs parameter-efficient finetuning on Alpaca (Taori et al., 2023) and Flan v2 (Longpre et al., 2023) datasets.
Dataset Splits	No	The paper states that Alpaca and Flan v2 datasets were used for finetuning, and MMLU and Commonsense QA benchmarks for evaluation, but does not explicitly provide training/validation/test splits for the finetuning datasets themselves.
Hardware Specification	Yes	All our experiments are conducted on Nvidia Tesla A100 GPUs.
Software Dependencies	No	The paper mentions using an optimizer (paged AdamW) and specifies hyperparameters, but does not provide specific version numbers for software dependencies such as deep learning frameworks or libraries.
Experiment Setup	Yes	Following (Dettmers et al., 2023), we apply the double quantization mechanism, and set the block size is 64 for quantization and 256 for double quantization. Regarding Lo RA parameters, we set r = 64, α = 16, and Lo RA dropout of 0.1 for models up to 13B and 0.05 for 33B and 65B models. We employ the paged Adam W optimizer with a beta2 value of 0.999, and a learning rate of 2e-4 for models up to 13B and 1e-4 for 33B and 65B models., limiting the maximum gradient norm to 0.3 and adopting a constant learning rate strategy. Fine-tuning was executed for 10,000 and 20,000 steps on the Alpaca and FLAN v2 datasets, respectively, utilizing a batch size 16.