reproducibilityindex.ai

Discovering Sparsity Allocation for Layer-wise Pruning of Large Language Models

Authors: Lujun Li, Peijie Dong, Zhenheng Tang, Xiang Liu, Qiang Wang, Wenhan Luo, Wei Xue, Qifeng Liu, Xiaowen Chu, Yike Guo

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive experiments on multiple challenging tasks such as arithmetic, knowledge reasoning, and multimodal benchmarks spanning GSM8K, MMLU, SQA, and VQA, demonstrating that our DSA method achieves significant performance gains on the LLa MA-1\|2\|3, Mistral, and OPT models.
Researcher Affiliation	Academia	1Hong Kong University of Science and Technology 2Hong Kong University of Science and Technology (Guangzhou) 3Hong Kong Baptist University 4Harbin Institute of Technology (Shenzhen)
Pseudocode	Yes	Algorithm 1 Evolutionary Search for Allocation Function Discovery
Open Source Code	Yes	Codes at: https://github.com/lliai/DSA
Open Datasets	Yes	We employ a set of seven tasks sourced from the Eleuther AI LM Harness [50]... GSM8K [8] and MMLU [22] datasets... VQAv2 [17], SQA [37], and VQA [47].
Dataset Splits	Yes	This involves computing the sparsity ratios by applying the candidate function to the sparsity metric, evaluating the pruned model on a validation set using a performance metric, and checking if the pruned model s size satisfies the given constraint... we allocate 20% of the original dataset s training set as a held-out test set for the search process. We meticulously confirm that these validation datasets do not overlap with the test set, preventing any potential data leakage or bias in our evaluations.
Hardware Specification	Yes	In this way, we search our allocation function in only 0.5 day on a 1 NVIDIA GPU H800 server based on Wanda using perplexity results from the validation set of LLa MA-1-7B on Wiki Text2 [41].
Software Dependencies	No	The paper does not provide specific version numbers for key software components or libraries, only mentioning general tools like 'Wanda' and 'Sparse GPT' without version details.
Experiment Setup	Yes	During the search phase, we configure the evolutionary algorithm (Algorithm 1) with a population size of 20, a maximum of 1,000 iterations, a sample ratio of 0.9, and a top-k value of 5.