Compressing Large Language Models by Joint Sparsification and Quantization
Authors: Jinyang Guo, Jianyu Wu, Zining Wang, Jiaheng Liu, Ge Yang, Yifu Ding, Ruihao Gong, Haotong Qin, Xianglong Liu
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comprehensive experiments across various datasets and architectures affirm the efficacy of our JSQ framework. |
| Researcher Affiliation | Collaboration | 1State Key Laboratory of Complex & Critical Software Environment, Beihang University 2Institute of Artificial Intelligence, Beihang University 3Sense Time Research 4ETH Zurich. |
| Pseudocode | Yes | Algorithm 1 Our JSQ workflow for compressing LLMs |
| Open Source Code | Yes | Our code is released at https://github.com/uanu2002/JSQ. |
| Open Datasets | Yes | We evaluate JSQ framework on most widely used LLM model families including: LLa MA (Touvron et al., 2023a), LLa MA-2 (Touvron et al., 2023b), and Chat GLM3 (Du et al., 2021). We follow LLa MA s protocol to perform zeroshot classification evaluation on commonly used datasets including PIQA (Bisk et al., 2020), Bool Q (Clark et al., 2019), MMLU (Hendrycks et al., 2020), Hella Swag (Zellers et al., 2019), Arc-easy (Clark et al., 2018), Arc-challenge (Clark et al., 2018), and Wino Grande (Sakaguchi et al., 2021). Following previous works on LLM compression (Sun et al., 2023; Xiao et al., 2023), we also evaluate the perplexity on Wiki Text2 (Merity et al., 2016). |
| Dataset Splits | Yes | To calculate the SAR metric, we use 128 samples from C4 (Raffel et al., 2020) as the calibration data. In the simulated annealing search, we set the initial and the final temperature as 300 and 10, respectively. We randomly change one element in the editing vector V and iteratively perform this process to search for the best editing vector V. In this way, we can smoothly edit the relatively useless outliers for subsequent sparsification and quantization process. |
| Hardware Specification | No | The paper discusses computation complexity and memory usage (e.g., in Figure 4), but it does not provide specific details about the hardware used to run the experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., specific libraries, frameworks, or programming language versions) used for the experiments. |
| Experiment Setup | Yes | For fair comparison, we follow (Sun et al., 2023) to utilize uniform sparsity for all linear layers. We set the input length as 2,048. The editing strength choice R is set as {0, 4e 5, 5e 5, 6e 5, 7e 5}. To calculate the SAR metric, we use 128 samples from C4 (Raffel et al., 2020) as the calibration data. In the simulated annealing search, we set the initial and the final temperature as 300 and 10, respectively. |