reproducibilityindex.ai

Compressing Large Language Models by Joint Sparsification and Quantization

Authors: Jinyang Guo, Jianyu Wu, Zining Wang, Jiaheng Liu, Ge Yang, Yifu Ding, Ruihao Gong, Haotong Qin, Xianglong Liu

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Comprehensive experiments across various datasets and architectures affirm the efficacy of our JSQ framework.
Researcher Affiliation	Collaboration	1State Key Laboratory of Complex & Critical Software Environment, Beihang University 2Institute of Artificial Intelligence, Beihang University 3Sense Time Research 4ETH Zurich.
Pseudocode	Yes	Algorithm 1 Our JSQ workflow for compressing LLMs
Open Source Code	Yes	Our code is released at https://github.com/uanu2002/JSQ.
Open Datasets	Yes	We evaluate JSQ framework on most widely used LLM model families including: LLa MA (Touvron et al., 2023a), LLa MA-2 (Touvron et al., 2023b), and Chat GLM3 (Du et al., 2021). We follow LLa MA s protocol to perform zeroshot classification evaluation on commonly used datasets including PIQA (Bisk et al., 2020), Bool Q (Clark et al., 2019), MMLU (Hendrycks et al., 2020), Hella Swag (Zellers et al., 2019), Arc-easy (Clark et al., 2018), Arc-challenge (Clark et al., 2018), and Wino Grande (Sakaguchi et al., 2021). Following previous works on LLM compression (Sun et al., 2023; Xiao et al., 2023), we also evaluate the perplexity on Wiki Text2 (Merity et al., 2016).
Dataset Splits	Yes	To calculate the SAR metric, we use 128 samples from C4 (Raffel et al., 2020) as the calibration data. In the simulated annealing search, we set the initial and the final temperature as 300 and 10, respectively. We randomly change one element in the editing vector V and iteratively perform this process to search for the best editing vector V. In this way, we can smoothly edit the relatively useless outliers for subsequent sparsification and quantization process.
Hardware Specification	No	The paper discusses computation complexity and memory usage (e.g., in Figure 4), but it does not provide specific details about the hardware used to run the experiments, such as GPU or CPU models.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers (e.g., specific libraries, frameworks, or programming language versions) used for the experiments.
Experiment Setup	Yes	For fair comparison, we follow (Sun et al., 2023) to utilize uniform sparsity for all linear layers. We set the input length as 2,048. The editing strength choice R is set as {0, 4e 5, 5e 5, 6e 5, 7e 5}. To calculate the SAR metric, we use 128 samples from C4 (Raffel et al., 2020) as the calibration data. In the simulated annealing search, we set the initial and the final temperature as 300 and 10, respectively.