reproducibilityindex.ai

SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models

Authors: Xudong Lu, Aojun Zhou, Yuhui Xu, Renrui Zhang, Peng Gao, Hongsheng Li

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we report a series of experiments to demonstrate the effectiveness of SPP for the efficient training of sparse pruned models.
Researcher Affiliation	Collaboration	1Multimedia Laboratory (MMLab), The Chinese University of Hong Kong 2Salesforce AI Research 3Shanghai Artificial Intelligence Laboratory 4CPII under Inno HK.
Pseudocode	No	The paper describes its method with figures and explanations but does not include a formal pseudocode or algorithm block.
Open Source Code	No	Code will be made available at https: //github.com/Lucky-Lance/SPP.
Open Datasets	Yes	We use high quality instruction fine-tuning dataset Stanford-Alpaca (Taori et al., 2023) to train the pruned models.
Dataset Splits	No	The paper uses the Stanford-Alpaca dataset for training but does not explicitly describe train/validation/test splits for this dataset within the fine-tuning process. It uses zero-shot and few-shot evaluation on other benchmarks.
Hardware Specification	Yes	All the training and testing processes are conducted on a server with 8 NVIDIA A100-80GB GPUs.
Software Dependencies	No	We use the Adam W optimizer with default setting in the Transformers package4. The paper mentions the "Transformers package" and refers to its GitHub, but does not provide a specific version number for this or other key software components used in the experiments.
Experiment Setup	Yes	For training 7B/13B/30B/65B/70B models, we use learning rates of 4e-3/2e-3/4e-3/5e-4/5e4 with per-device batch size set to 8/4/16/8/8. Following (Dettmers et al., 2023), we set a 0.03 warm-up ratio, but decay the learning rate after reaching the targeted peak value. We use the Adam W optimizer with default setting in the Transformers package4 and add a 0.001 weight decay.