The LLM Surgeon
Authors: Tycho F. A. van der Ouderaa, Markus Nagel, Mart Van Baalen, Tijmen Blankevoort
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimentally, our method can prune rows and columns from a range of OPT models and Llamav2-7B by 20%-30%, with a negligible loss in performance, and achieve state-of-the-art results in unstructured and semi-structured pruning of large language models. |
| Researcher Affiliation | Collaboration | 1Imperial College London , 2Qualcomm AI Research , 3QUVA Lab, University of Amsterdam |
| Pseudocode | Yes | Algorithm 1 LLM Surgeon (structured) |
| Open Source Code | Yes | Code is available at: https://github.com/Qualcomm-AI-research/llm-surgeon. |
| Open Datasets | Yes | We compare compression performance of LLM Surgeon on language modeling tasks on OPT (Zhang et al., 2022) and Llama-v2 (Touvron et al., 2023) model families, using data from wikitext-2 dataset (appendix B.2). |
| Dataset Splits | No | The paper mentions using 'training data set' and 'standard test split', but it does not explicitly define a validation set or its specific split percentage/count for reproducibility. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models, memory specifications, or cloud instance types used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies (e.g., programming languages, libraries, frameworks) used in the experiments. |
| Experiment Setup | Yes | For compression, we use 128 sequences with a sequence length of 2048 tokens from the training data set and evaluate test perplexity (PPL) on the standard test split. In our experiments, we use a linear sparsity schedule αt=1 t( 1 α T ) at each shot s before reaching the final sparsity α. We use 40 shots at α=0.5 sparsity and report intermediate compression rates, effectively using T=8 shots for α=0.9, T=16 for α=0.8, T=24 for α=0.7, and T=32 for α=0.6. |