Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

ExCP: Extreme LLM Checkpoint Compression via Weight-Momentum Joint Shrinking

Authors: Wenshuo Li, Xinghao Chen, Han Shu, Yehui Tang, Yunhe Wang

ICML 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We extensively evaluate our proposed Ex CP framework on several models ranging from 410M to 7B parameters and demonstrate signiﬁcant storage reduction while maintaining strong performance. For instance, we achieve approximately 70 compression for the Pythia-410M model, with the ﬁnal performance being as accurate as the original model on various downstream tasks.
Researcher Affiliation	Collaboration	1Huawei Noah s Ark Lab 2University of Science and Technology of China. Correspondence to: Xinghao Chen <EMAIL>, Yunhe Wang <EMAIL>.
Pseudocode	Yes	Algorithm 1 Compressing process
Open Source Code	Yes	Codes will be available at https://github.com/Gaffey/Ex CP.
Open Datasets	Yes	We conduct our experiments on Vi T-L32 (Dosovitskiy et al., 2020), Pythia-410M (Biderman et al., 2023), Pan Gu-π-1B and Pan Gu-π-7B (Wang et al., 2023) models. ... We train Pythia-410M on on a subset of the standard Pile (Gao et al., 2020) dataset.
Dataset Splits	No	The paper states training on a 'subset of the standard Pile (Gao et al., 2020) dataset' and evaluating on benchmarks like 'Hella Swag, ARCeasy, PIQA, C3, CSL and LAMBADA tasks', but does not provide specific train/validation/test split percentages or counts for any of the datasets used.
Hardware Specification	No	The paper mentions general hardware like 'thousands of GPUs or computing cards like TPUs or Ascends' in the introduction but does not specify the exact GPU models, CPU types, or other hardware configurations used for their experiments.
Software Dependencies	No	The paper mentions software like 'Adam optimizer', '7zip compression algorithm', 'K-means algorithm', and 'opencompass', but does not provide specific version numbers for any of these software dependencies.
Experiment Setup	Yes	Unless otherwise speciﬁed, we set the α in equation 5 and β in equation 6 as 5e 5 and 2.0 in our experiments, respectively. The weights except zero are non-uniformly quantized to 2n 1 clustering center while the value zero occupies one center. And the bit number n is set as 4 in experiments.