reproducibilityindex.ai

A Simple and Effective Pruning Approach for Large Language Models

Authors: Mingjie Sun, Zhuang Liu, Anna Bair, J Zico Kolter

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct a thorough evaluation of our method Wanda on LLa MA and LLa MA-2 across various language benchmarks. We empirically evaluate Wanda on the widely adopted LLa MA (Touvron et al., 2023a) and LLa MA2 (Touvron et al., 2023b) model families.
Researcher Affiliation	Collaboration	1Carnegie Mellon University 2Meta AI Research 3Bosch Center for AI
Pseudocode	Yes	Algorithm 1 Py Torch code for Wanda
Open Source Code	Yes	Code is available at https://github.com/locuslab/wanda.
Open Datasets	Yes	To control this variable factor, we use the exact same set of calibration data as Sparse GPT, which consists of 128 sequences with context length size sampled from C4 training set (Raffel et al., 2020).
Dataset Splits	Yes	we evaluate the perplexity on the held-out Wiki Text (Merity et al., 2016) validation set.
Hardware Specification	Yes	Specifically, we measure the accumulated time for computing the pruning metric at each layer (excluding the forward pass process shared by both methods) on NVIDIA A6000 GPUs. We evaluate the inference speedup for structured 2:4 sparsity on NVIDIA A6000 GPUs.
Software Dependencies	No	The paper mentions 'Py Torch code' in Algorithm 1, but does not provide specific version numbers for PyTorch or any other software dependencies.
Experiment Setup	Yes	For all pruning methods, we focus on pruning the linear layers (skipping the first embedding layer and the final classification head), which account for around 99% of the total LLM parameters. We impose a uniform sparsity for all linear layers. We use the exact same set of calibration data as Sparse GPT, which consists of 128 sequences with context length size sampled from C4 training set (Raffel et al., 2020). We investigate two strategies for fine-tuning LLMs: Lo RA (Hu et al., 2021) fine-tuning and full parameter dense fine-tuning. Fine-tuning is conducted on C4 training dataset and the objective is the pre-training auto-regressive loss. The pruned mask is kept fixed during fine-tuning. We enforce a limited computational budget (1 GPU and 12 hours).