ExCP: Extreme LLM Checkpoint Compression via Weight-Momentum Joint Shrinking

Authors: Wenshuo Li, Xinghao Chen, Han Shu, Yehui Tang, Yunhe Wang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We extensively evaluate our proposed Ex CP framework on several models ranging from 410M to 7B parameters and demonstrate significant storage reduction while maintaining strong performance. For instance, we achieve approximately 70 compression for the Pythia-410M model, with the final performance being as accurate as the original model on various downstream tasks.
Researcher Affiliation Collaboration 1Huawei Noah s Ark Lab 2University of Science and Technology of China. Correspondence to: Xinghao Chen <xinghao.chen@huawei.com>, Yunhe Wang <yunhe.wang@huawei.com>.
Pseudocode Yes Algorithm 1 Compressing process
Open Source Code Yes Codes will be available at https://github.com/Gaffey/Ex CP.
Open Datasets Yes We conduct our experiments on Vi T-L32 (Dosovitskiy et al., 2020), Pythia-410M (Biderman et al., 2023), Pan Gu-π-1B and Pan Gu-π-7B (Wang et al., 2023) models. ... We train Pythia-410M on on a subset of the standard Pile (Gao et al., 2020) dataset.
Dataset Splits No The paper states training on a 'subset of the standard Pile (Gao et al., 2020) dataset' and evaluating on benchmarks like 'Hella Swag, ARCeasy, PIQA, C3, CSL and LAMBADA tasks', but does not provide specific train/validation/test split percentages or counts for any of the datasets used.
Hardware Specification No The paper mentions general hardware like 'thousands of GPUs or computing cards like TPUs or Ascends' in the introduction but does not specify the exact GPU models, CPU types, or other hardware configurations used for their experiments.
Software Dependencies No The paper mentions software like 'Adam optimizer', '7zip compression algorithm', 'K-means algorithm', and 'opencompass', but does not provide specific version numbers for any of these software dependencies.
Experiment Setup Yes Unless otherwise specified, we set the α in equation 5 and β in equation 6 as 5e 5 and 2.0 in our experiments, respectively. The weights except zero are non-uniformly quantized to 2n 1 clustering center while the value zero occupies one center. And the bit number n is set as 4 in experiments.