DISP-LLM: Dimension-Independent Structural Pruning for Large Language Models

Authors: Shangqian Gao, Chi-Heng Lin, Ting Hua, Zheng Tang, Yilin Shen, Hongxia Jin, Yen-Chang Hsu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our method on various LLMs, including OPT, LLa MA, LLa MA-2, Phi-1.5, and Phi-2. Experimental results demonstrate that our approach outperforms other state-of-the-art methods, showing for the first time that structural pruning can achieve an accuracy similar to semi-structural pruning.
Researcher Affiliation Collaboration Shangqian Gao Florida State University Chi-Heng Lin Samsung Research America Ting Hua Samsung Research America Tang Zheng Samsung Research America Yilin Shen Samsung Research America Hongxia Jin Samsung Research America Yen-Chang Hsu Samsung Research America
Pseudocode Yes Algorithm 1: Block inference after pruning.
Open Source Code No Due to the company policy, the code will only be released after going through the internal review process.
Open Datasets Yes Following previous papers [2, 30], we use Wiki Text-2 and Alpaca datasets to train the hypernetwork.
Dataset Splits No The paper mentions using Wiki Text-2 and Alpaca datasets to train the hypernetwork but does not specify explicit training, validation, or test splits with percentages or sample counts for these datasets.
Hardware Specification Yes Depending on the size of the base model, we use 1 to 4 NVIDIA A100 GPUs to train the hypernetwork.
Software Dependencies No The paper mentions 'Pytorch [32] and Hugging Face transformer library [41]' but does not specify version numbers for these software components.
Experiment Setup Yes The hypernetwork is trained for 10,000 iterations for all models. For all experiments, we set λ in Obj. 5 to 6. During training the hypernetwork, we use Adam W optimizer to optimize it with a constant learning rate 10 3 and weight decay 0.05. We always set the mini-batchsize to 1 on each GPU.