Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
DISP-LLM: Dimension-Independent Structural Pruning for Large Language Models
Authors: Shangqian Gao, Chi-Heng Lin, Ting Hua, Zheng Tang, Yilin Shen, Hongxia Jin, Yen-Chang Hsu
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our method on various LLMs, including OPT, LLa MA, LLa MA-2, Phi-1.5, and Phi-2. Experimental results demonstrate that our approach outperforms other state-of-the-art methods, showing for the first time that structural pruning can achieve an accuracy similar to semi-structural pruning. |
| Researcher Affiliation | Collaboration | Shangqian Gao Florida State University Chi-Heng Lin Samsung Research America Ting Hua Samsung Research America Tang Zheng Samsung Research America Yilin Shen Samsung Research America Hongxia Jin Samsung Research America Yen-Chang Hsu Samsung Research America |
| Pseudocode | Yes | Algorithm 1: Block inference after pruning. |
| Open Source Code | No | Due to the company policy, the code will only be released after going through the internal review process. |
| Open Datasets | Yes | Following previous papers [2, 30], we use Wiki Text-2 and Alpaca datasets to train the hypernetwork. |
| Dataset Splits | No | The paper mentions using Wiki Text-2 and Alpaca datasets to train the hypernetwork but does not specify explicit training, validation, or test splits with percentages or sample counts for these datasets. |
| Hardware Specification | Yes | Depending on the size of the base model, we use 1 to 4 NVIDIA A100 GPUs to train the hypernetwork. |
| Software Dependencies | No | The paper mentions 'Pytorch [32] and Hugging Face transformer library [41]' but does not specify version numbers for these software components. |
| Experiment Setup | Yes | The hypernetwork is trained for 10,000 iterations for all models. For all experiments, we set λ in Obj. 5 to 6. During training the hypernetwork, we use Adam W optimizer to optimize it with a constant learning rate 10 3 and weight decay 0.05. We always set the mini-batchsize to 1 on each GPU. |