Pruner-Zero: Evolving Symbolic Pruning Metric From Scratch for Large Language Models

Authors: Peijie Dong, Lujun Li, Zhenheng Tang, Xiang Liu, Xinglin Pan, Qiang Wang, Xiaowen Chu

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on LLa MA and LLa MA-2 on language modeling and zero-shot tasks demonstrate that our Pruner Zero obtains superior performance than SOTA post-training pruning methods.
Researcher Affiliation Academia 1The Hong Kong University of Science and Technology (Guangzhou) 2The Hong Kong University of Science and Technology 3Hong Kong Baptist Univeristy 4Harbin Institude of Technology, Shenzhen.
Pseudocode Yes Algorithm 1 Evolution Search for Pruner-Zero
Open Source Code Yes Code at: https: //github.com/pprp/Pruner-Zero.
Open Datasets Yes We conduct fast post-training pruning evaluations for each SPM for LLa MA7B on the Wiki Text2 dataset to get the perplexity of its fitness in less than 5 minutes. Additionally, we employ Lo RA (Hu et al., 2022) (r = 8) for fine-tuning on the C4 training dataset (Raffel et al., 2019) using 1 GPU and 12 hours, targeting the auto-regressive loss.
Dataset Splits Yes The performance of the pruned models is evaluated in terms of language modeling and zero-shot tasks. For language modeling, we follow the established protocols in LLM compression research (Sun et al., 2024; Frantar & Alistarh, 2023) to assess the perplexity on the Wiki Text2 (Merity et al., 2017) validation set.
Hardware Specification Yes The genetic programming tasks are executed on two NVIDIA 4090 GPUs, while the generalization experiments involving zero-shot tasks and language modeling on the LLa MA-2-70B are conducted using 8 A100 GPUs.
Software Dependencies No The paper mentions using the Light LLM framework and generally refers to common tools and models, but it does not provide specific version numbers for software dependencies like Python, PyTorch, or the Light LLM framework itself.
Experiment Setup Yes The evolutionary search starts with an initial population of 50 and 300 iterations. The depth of symbolic trees ranges from 3 to 5. Tournament selection utilizes a top-K parameter of 10, selecting two parent symbolic pruning metrics from the 10 best-performing candidates. The mutation probability is set to 0.5. This search, focused on identifying an optimal symbolic pruning metric, is executed using the LLa MA-2-7B model. Perplexity is evaluated under unstructured pruning with 50% sparsity.