ExCP: Extreme LLM Checkpoint Compression via Weight-Momentum Joint Shrinking
Authors: Wenshuo Li, Xinghao Chen, Han Shu, Yehui Tang, Yunhe Wang
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We extensively evaluate our proposed Ex CP framework on several models ranging from 410M to 7B parameters and demonstrate significant storage reduction while maintaining strong performance. For instance, we achieve approximately 70 compression for the Pythia-410M model, with the final performance being as accurate as the original model on various downstream tasks. |
| Researcher Affiliation | Collaboration | 1Huawei Noah s Ark Lab 2University of Science and Technology of China. Correspondence to: Xinghao Chen <xinghao.chen@huawei.com>, Yunhe Wang <yunhe.wang@huawei.com>. |
| Pseudocode | Yes | Algorithm 1 Compressing process |
| Open Source Code | Yes | Codes will be available at https://github.com/Gaffey/Ex CP. |
| Open Datasets | Yes | We conduct our experiments on Vi T-L32 (Dosovitskiy et al., 2020), Pythia-410M (Biderman et al., 2023), Pan Gu-π-1B and Pan Gu-π-7B (Wang et al., 2023) models. ... We train Pythia-410M on on a subset of the standard Pile (Gao et al., 2020) dataset. |
| Dataset Splits | No | The paper states training on a 'subset of the standard Pile (Gao et al., 2020) dataset' and evaluating on benchmarks like 'Hella Swag, ARCeasy, PIQA, C3, CSL and LAMBADA tasks', but does not provide specific train/validation/test split percentages or counts for any of the datasets used. |
| Hardware Specification | No | The paper mentions general hardware like 'thousands of GPUs or computing cards like TPUs or Ascends' in the introduction but does not specify the exact GPU models, CPU types, or other hardware configurations used for their experiments. |
| Software Dependencies | No | The paper mentions software like 'Adam optimizer', '7zip compression algorithm', 'K-means algorithm', and 'opencompass', but does not provide specific version numbers for any of these software dependencies. |
| Experiment Setup | Yes | Unless otherwise specified, we set the α in equation 5 and β in equation 6 as 5e 5 and 2.0 in our experiments, respectively. The weights except zero are non-uniformly quantized to 2n 1 clustering center while the value zero occupies one center. And the bit number n is set as 4 in experiments. |