SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models

Authors: Xudong Lu, Aojun Zhou, Yuhui Xu, Renrui Zhang, Peng Gao, Hongsheng Li

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we report a series of experiments to demonstrate the effectiveness of SPP for the efficient training of sparse pruned models.
Researcher Affiliation Collaboration 1Multimedia Laboratory (MMLab), The Chinese University of Hong Kong 2Salesforce AI Research 3Shanghai Artificial Intelligence Laboratory 4CPII under Inno HK.
Pseudocode No The paper describes its method with figures and explanations but does not include a formal pseudocode or algorithm block.
Open Source Code No Code will be made available at https: //github.com/Lucky-Lance/SPP.
Open Datasets Yes We use high quality instruction fine-tuning dataset Stanford-Alpaca (Taori et al., 2023) to train the pruned models.
Dataset Splits No The paper uses the Stanford-Alpaca dataset for training but does not explicitly describe train/validation/test splits for this dataset within the fine-tuning process. It uses zero-shot and few-shot evaluation on other benchmarks.
Hardware Specification Yes All the training and testing processes are conducted on a server with 8 NVIDIA A100-80GB GPUs.
Software Dependencies No We use the Adam W optimizer with default setting in the Transformers package4. The paper mentions the "Transformers package" and refers to its GitHub, but does not provide a specific version number for this or other key software components used in the experiments.
Experiment Setup Yes For training 7B/13B/30B/65B/70B models, we use learning rates of 4e-3/2e-3/4e-3/5e-4/5e4 with per-device batch size set to 8/4/16/8/8. Following (Dettmers et al., 2023), we set a 0.03 warm-up ratio, but decay the learning rate after reaching the targeted peak value. We use the Adam W optimizer with default setting in the Transformers package4 and add a 0.001 weight decay.