Fluctuation-Based Adaptive Structured Pruning for Large Language Models

Authors: Yongqi An, Xu Zhao, Tao Yu, Ming Tang, Jinqiao Wang

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We thoroughly evaluate our approach on a variety of language benchmarks. Without any retraining, our method significantly outperforms the state-of-the-art methods, including LLM-Pruner and the extension of Wanda in structured pruning.
Researcher Affiliation Collaboration 1Foundation Model Research Center, Institute of Automation, Chinese Academy of Sciences, Beijing, China 2School of artificial intelligence, University of Chinese Academy of Sciences, Beijing, China 3Wuhan AI Research, Wuhan, China 4Objecteye Inc., Beijing, China
Pseudocode No The paper describes the proposed method in detail with text and mathematical formulas but does not include any pseudocode or algorithm blocks.
Open Source Code Yes The code is released at https://github.com/CASIA-IVA-Lab/FLAP.
Open Datasets Yes Our evaluation focuses on language modeling performance on the Wiki Text2 (Merity et al. 2016) validation set and zero-shot performance across seven common sense benchmarks using the Eleuther AI LM Harness (Gao et al. 2021).
Dataset Splits Yes Our evaluation focuses on language modeling performance on the Wiki Text2 (Merity et al. 2016) validation set and zero-shot performance across seven common sense benchmarks using the Eleuther AI LM Harness (Gao et al. 2021). For this analysis, we set a pruning ratio of 50% for the LLa Ma-7B model and observed the resultant perplexity on the Wiki Text2 dataset.
Hardware Specification Yes In this section, we empirically compare the actual parameter counts and inference speeds of different pruning methods, with the experiments conducted on NVIDIA A100 GPUs. The hardware is the NVIDIA A100-40G.
Software Dependencies No The paper does not provide specific version numbers for any software dependencies (e.g., programming languages, libraries, or frameworks) used in the experiments.
Experiment Setup Yes We conduct experiments on the LLa MA model family (LLa MA-7B/13B/30B/65B) to evaluate the efficacy of FLAP. Our evaluation focuses on language modeling performance on the Wiki Text2 (Merity et al. 2016) validation set and zero-shot performance across seven common sense benchmarks using the Eleuther AI LM Harness (Gao et al. 2021). For each of the LLa MA models, we present results at three distinct pruning ratios, as detailed in Table 1. In our experiments, we selected a default setting of 1024 calibration samples. Detailed experimental settings, model descriptions, and evaluation protocols are provided in the Appendix A.