SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models
Authors: Xudong Lu, Aojun Zhou, Yuhui Xu, Renrui Zhang, Peng Gao, Hongsheng Li
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we report a series of experiments to demonstrate the effectiveness of SPP for the efficient training of sparse pruned models. |
| Researcher Affiliation | Collaboration | 1Multimedia Laboratory (MMLab), The Chinese University of Hong Kong 2Salesforce AI Research 3Shanghai Artificial Intelligence Laboratory 4CPII under Inno HK. |
| Pseudocode | No | The paper describes its method with figures and explanations but does not include a formal pseudocode or algorithm block. |
| Open Source Code | No | Code will be made available at https: //github.com/Lucky-Lance/SPP. |
| Open Datasets | Yes | We use high quality instruction fine-tuning dataset Stanford-Alpaca (Taori et al., 2023) to train the pruned models. |
| Dataset Splits | No | The paper uses the Stanford-Alpaca dataset for training but does not explicitly describe train/validation/test splits for this dataset within the fine-tuning process. It uses zero-shot and few-shot evaluation on other benchmarks. |
| Hardware Specification | Yes | All the training and testing processes are conducted on a server with 8 NVIDIA A100-80GB GPUs. |
| Software Dependencies | No | We use the Adam W optimizer with default setting in the Transformers package4. The paper mentions the "Transformers package" and refers to its GitHub, but does not provide a specific version number for this or other key software components used in the experiments. |
| Experiment Setup | Yes | For training 7B/13B/30B/65B/70B models, we use learning rates of 4e-3/2e-3/4e-3/5e-4/5e4 with per-device batch size set to 8/4/16/8/8. Following (Dettmers et al., 2023), we set a 0.03 warm-up ratio, but decay the learning rate after reaching the targeted peak value. We use the Adam W optimizer with default setting in the Transformers package4 and add a 0.001 weight decay. |