Memory-Efficient Fine-Tuning of Compressed Large Language Models via sub-4-bit Integer Quantization

Authors: Jeonghoon Kim, Jung Hyun Lee, Sungdong Kim, Joonsuk Park, Kang Min Yoo, Se Jung Kwon, Dongsoo Lee

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 Experiments In this section, we empirically validate the effectiveness of our proposed PEQA method by examining its performance in both parameter-efficient fine-tuning (PEFT) and as a quantization method. We achieve this goal by using a series of benchmarks [52 57], datasets [51, 58, 59], and LLMs [4, 6, 60, 61] that have been publicly introduced.
Researcher Affiliation Collaboration Jeonghoon Kim NAVER Cloud jeonghoon.samuel@gmail.com Jung Hyun Lee NAVER Cloud onliwad101@gmail.com Sungdong Kim NAVER Cloud, KAIST AI sungdong.kim@navercorp.com Joonsuk Park NAVER Cloud, NAVER AI Lab, University of Richmond park@joonsuk.org Kang Min Yoo NAVER Cloud, SNU AI Center kangmin.yoo@gmail.com Se Jung Kwon NAVER Cloud sejung.kwon@navercorp.com Dongsoo Lee NAVER Cloud dongsoo.lee@navercorp.com
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No We utilize Huggingface repository[66]3 for training, evaluation code and dataset.
Open Datasets Yes We fine-tune and assess LLMs on the Wikitext2 [51] and Penn Tree Bank (PTB) [58] datasets using PEQA and Lo RA [21].
Dataset Splits No The paper does not provide specific data split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning.
Hardware Specification Yes To provide a clear understanding of these benefits, we conducted tests using a single NVIDIA A100-80GB GPU and the causal language modeling code from the Hugging Face repository7.
Software Dependencies No For the common experimental settings, Adam W [64] optimizer and linear-decaying learning rate scheduler were used. We use Deepspeed repository [65] 2 for FP16 and BF16 training. Additionally, we utilize Huggingface repository[66]3 for training, evaluation code and dataset.
Experiment Setup Yes Batch size and epoch for all experiments are set to 128 and 15 respectively. The learning rates for the experiments of Table 2 are displayed in Table 8.