reproducibilityindex.ai

Fluctuation-Based Adaptive Structured Pruning for Large Language Models

Authors: Yongqi An, Xu Zhao, Tao Yu, Ming Tang, Jinqiao Wang

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We thoroughly evaluate our approach on a variety of language benchmarks. Without any retraining, our method significantly outperforms the state-of-the-art methods, including LLM-Pruner and the extension of Wanda in structured pruning.
Researcher Affiliation	Collaboration	1Foundation Model Research Center, Institute of Automation, Chinese Academy of Sciences, Beijing, China 2School of artificial intelligence, University of Chinese Academy of Sciences, Beijing, China 3Wuhan AI Research, Wuhan, China 4Objecteye Inc., Beijing, China
Pseudocode	No	The paper describes the proposed method in detail with text and mathematical formulas but does not include any pseudocode or algorithm blocks.
Open Source Code	Yes	The code is released at https://github.com/CASIA-IVA-Lab/FLAP.
Open Datasets	Yes	Our evaluation focuses on language modeling performance on the Wiki Text2 (Merity et al. 2016) validation set and zero-shot performance across seven common sense benchmarks using the Eleuther AI LM Harness (Gao et al. 2021).
Dataset Splits	Yes	Our evaluation focuses on language modeling performance on the Wiki Text2 (Merity et al. 2016) validation set and zero-shot performance across seven common sense benchmarks using the Eleuther AI LM Harness (Gao et al. 2021). For this analysis, we set a pruning ratio of 50% for the LLa Ma-7B model and observed the resultant perplexity on the Wiki Text2 dataset.
Hardware Specification	Yes	In this section, we empirically compare the actual parameter counts and inference speeds of different pruning methods, with the experiments conducted on NVIDIA A100 GPUs. The hardware is the NVIDIA A100-40G.
Software Dependencies	No	The paper does not provide specific version numbers for any software dependencies (e.g., programming languages, libraries, or frameworks) used in the experiments.
Experiment Setup	Yes	We conduct experiments on the LLa MA model family (LLa MA-7B/13B/30B/65B) to evaluate the efficacy of FLAP. Our evaluation focuses on language modeling performance on the Wiki Text2 (Merity et al. 2016) validation set and zero-shot performance across seven common sense benchmarks using the Eleuther AI LM Harness (Gao et al. 2021). For each of the LLa MA models, we present results at three distinct pruning ratios, as detailed in Table 1. In our experiments, we selected a default setting of 1024 calibration samples. Detailed experimental settings, model descriptions, and evaluation protocols are provided in the Appendix A.