Memory-Efficient Fine-Tuning of Compressed Large Language Models via sub-4-bit Integer Quantization
Authors: Jeonghoon Kim, Jung Hyun Lee, Sungdong Kim, Joonsuk Park, Kang Min Yoo, Se Jung Kwon, Dongsoo Lee
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 Experiments In this section, we empirically validate the effectiveness of our proposed PEQA method by examining its performance in both parameter-efficient fine-tuning (PEFT) and as a quantization method. We achieve this goal by using a series of benchmarks [52 57], datasets [51, 58, 59], and LLMs [4, 6, 60, 61] that have been publicly introduced. |
| Researcher Affiliation | Collaboration | Jeonghoon Kim NAVER Cloud jeonghoon.samuel@gmail.com Jung Hyun Lee NAVER Cloud onliwad101@gmail.com Sungdong Kim NAVER Cloud, KAIST AI sungdong.kim@navercorp.com Joonsuk Park NAVER Cloud, NAVER AI Lab, University of Richmond park@joonsuk.org Kang Min Yoo NAVER Cloud, SNU AI Center kangmin.yoo@gmail.com Se Jung Kwon NAVER Cloud sejung.kwon@navercorp.com Dongsoo Lee NAVER Cloud dongsoo.lee@navercorp.com |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | We utilize Huggingface repository[66]3 for training, evaluation code and dataset. |
| Open Datasets | Yes | We fine-tune and assess LLMs on the Wikitext2 [51] and Penn Tree Bank (PTB) [58] datasets using PEQA and Lo RA [21]. |
| Dataset Splits | No | The paper does not provide specific data split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning. |
| Hardware Specification | Yes | To provide a clear understanding of these benefits, we conducted tests using a single NVIDIA A100-80GB GPU and the causal language modeling code from the Hugging Face repository7. |
| Software Dependencies | No | For the common experimental settings, Adam W [64] optimizer and linear-decaying learning rate scheduler were used. We use Deepspeed repository [65] 2 for FP16 and BF16 training. Additionally, we utilize Huggingface repository[66]3 for training, evaluation code and dataset. |
| Experiment Setup | Yes | Batch size and epoch for all experiments are set to 128 and 15 respectively. The learning rates for the experiments of Table 2 are displayed in Table 8. |