Fluctuation-Based Adaptive Structured Pruning for Large Language Models
Authors: Yongqi An, Xu Zhao, Tao Yu, Ming Tang, Jinqiao Wang
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We thoroughly evaluate our approach on a variety of language benchmarks. Without any retraining, our method significantly outperforms the state-of-the-art methods, including LLM-Pruner and the extension of Wanda in structured pruning. |
| Researcher Affiliation | Collaboration | 1Foundation Model Research Center, Institute of Automation, Chinese Academy of Sciences, Beijing, China 2School of artificial intelligence, University of Chinese Academy of Sciences, Beijing, China 3Wuhan AI Research, Wuhan, China 4Objecteye Inc., Beijing, China |
| Pseudocode | No | The paper describes the proposed method in detail with text and mathematical formulas but does not include any pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is released at https://github.com/CASIA-IVA-Lab/FLAP. |
| Open Datasets | Yes | Our evaluation focuses on language modeling performance on the Wiki Text2 (Merity et al. 2016) validation set and zero-shot performance across seven common sense benchmarks using the Eleuther AI LM Harness (Gao et al. 2021). |
| Dataset Splits | Yes | Our evaluation focuses on language modeling performance on the Wiki Text2 (Merity et al. 2016) validation set and zero-shot performance across seven common sense benchmarks using the Eleuther AI LM Harness (Gao et al. 2021). For this analysis, we set a pruning ratio of 50% for the LLa Ma-7B model and observed the resultant perplexity on the Wiki Text2 dataset. |
| Hardware Specification | Yes | In this section, we empirically compare the actual parameter counts and inference speeds of different pruning methods, with the experiments conducted on NVIDIA A100 GPUs. The hardware is the NVIDIA A100-40G. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies (e.g., programming languages, libraries, or frameworks) used in the experiments. |
| Experiment Setup | Yes | We conduct experiments on the LLa MA model family (LLa MA-7B/13B/30B/65B) to evaluate the efficacy of FLAP. Our evaluation focuses on language modeling performance on the Wiki Text2 (Merity et al. 2016) validation set and zero-shot performance across seven common sense benchmarks using the Eleuther AI LM Harness (Gao et al. 2021). For each of the LLa MA models, we present results at three distinct pruning ratios, as detailed in Table 1. In our experiments, we selected a default setting of 1024 calibration samples. Detailed experimental settings, model descriptions, and evaluation protocols are provided in the Appendix A. |