reproducibilityindex.ai

DISP-LLM: Dimension-Independent Structural Pruning for Large Language Models

Authors: Shangqian Gao, Chi-Heng Lin, Ting Hua, Zheng Tang, Yilin Shen, Hongxia Jin, Yen-Chang Hsu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our method on various LLMs, including OPT, LLa MA, LLa MA-2, Phi-1.5, and Phi-2. Experimental results demonstrate that our approach outperforms other state-of-the-art methods, showing for the first time that structural pruning can achieve an accuracy similar to semi-structural pruning.
Researcher Affiliation	Collaboration	Shangqian Gao Florida State University Chi-Heng Lin Samsung Research America Ting Hua Samsung Research America Tang Zheng Samsung Research America Yilin Shen Samsung Research America Hongxia Jin Samsung Research America Yen-Chang Hsu Samsung Research America
Pseudocode	Yes	Algorithm 1: Block inference after pruning.
Open Source Code	No	Due to the company policy, the code will only be released after going through the internal review process.
Open Datasets	Yes	Following previous papers [2, 30], we use Wiki Text-2 and Alpaca datasets to train the hypernetwork.
Dataset Splits	No	The paper mentions using Wiki Text-2 and Alpaca datasets to train the hypernetwork but does not specify explicit training, validation, or test splits with percentages or sample counts for these datasets.
Hardware Specification	Yes	Depending on the size of the base model, we use 1 to 4 NVIDIA A100 GPUs to train the hypernetwork.
Software Dependencies	No	The paper mentions 'Pytorch [32] and Hugging Face transformer library [41]' but does not specify version numbers for these software components.
Experiment Setup	Yes	The hypernetwork is trained for 10,000 iterations for all models. For all experiments, we set λ in Obj. 5 to 6. During training the hypernetwork, we use Adam W optimizer to optimize it with a constant learning rate 10 3 and weight decay 0.05. We always set the mini-batchsize to 1 on each GPU.