Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Breaking the Compression Ceiling: Data-Free Pipeline for Ultra-Efficient Delta Compression
Authors: Xiaohui Wang, Peng Ye, Chenyu Huang, Shenghe Zheng, Bo Zhang, LEI BAI, Wanli Ouyang, Tao Chen
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments across (a) large language models (fine-tuned on LLa MA-2 7B and 13B) with up to 50 compression, (b) general NLP models (Ro BERTa-base, T5-base) with up to 224 compression, (c) vision models (Vi T-B/32, Vi T-L/14) with up to 132 compression, and (d) multi-modal models (BEi T-3) with 18 compression, demonstrate that Ultra Delta consistently outperforms existing methods, especially under ultra-high compression. |
| Researcher Affiliation | Collaboration | Xiaohui Wang1 Peng Ye2,3 Chenyu Huang1 Shenghe Zheng2 Bo Zhang2 Lei Bai2 Wanli Ouyang2,3 Tao Chen1,4 1Fudan University 2Shanghai AI Laboratory 3The Chinese University of Hong Kong 4Shanghai Innovation Institute EMAIL EMAIL |
| Pseudocode | No | The paper describes the methodology in detailed text and formulas, and uses figures to illustrate the pipeline, but no explicit pseudocode or algorithm blocks are provided. |
| Open Source Code | Yes | Code is available at https://github.com/xiaohuiwang000/Ultra Delta. |
| Open Datasets | Yes | We primarily evaluate Ultra Delta on the LLa MA-2 series with sizes of 7B and 13B across three types of models: math (Wizard Math [56]), code (Wizard Coder [57]), and chat (LLAMA-2Chat [78]). The models are evaluated using GSM8K [15] for math (accuracy), Human Eval [11] for code (pass@1), and Truthful QA [49] for chat (accuracy). We also incorporate more recent LLMs and more challenging tasks: LLa MA-3.1-Tulu-8B [43] evaluated on MBPP+ [3] and Human Eval+ [11], Qwen2.5-7B-Instruct [89] evaluated on MATH [31] and GPQA [69], and Qwen3Guard-8B [77] evaluated on MMLU [30] and BBQ [63]. |
| Dataset Splits | Yes | Following [76, 91], we evaluate both T5-base [67] and Ro BERTa-base [54] models on the GLUE [81] benchmark, covering Co LA [83], SST-2 [71], MRPC [21], STS-B [10], QQP [36], MNLI [84], QNLI [68], and RTE [24]. For T5-base, we report Spearman s ρ on STS-B and accuracy on the other tasks. Settings and results of Ro BERTa-base are provided in App. C.3. ... We evaluate Vi T-B/32 and Vi T-L/14 [66] models on eight image classification datasets: SUN397 [87], Stanford Cars [40], RESISC45 [12], Euro SAT [29], SVHN [60], GTSRB [73], MNIST [44], and DTD [13]. Accuracy is used as the evaluation metric for all datasets. ... We compress delta weights on BEi T-3 [82] models fine-tuned on three datasets: VQA [26] (Visual Question Answering), NLVR2 [75] (Visual Reasoning), and COCO Captioning [51] (Image Captioning). ... We conducted controlled experiments on two datasets, SUN397 [87] and Cars [40], using a pretrained Vi T-B/32 model. By varying the number of fine-tuning steps, we controlled the degree of model fitting. |
| Hardware Specification | Yes | All original models are stored in FP16 format. Experiments are conducted on a NVIDIA A800 GPU. |
| Software Dependencies | No | The paper mentions using the lm-evaluation-harness [23] and Eval Plus [52] frameworks, but it does not specify version numbers for these or other software components like Python or PyTorch. |
| Experiment Setup | Yes | For the 7B model, we apply 4-bit quantization and prune 95% of parameters, achieving an 32.9 compression ratio; for the 13B model, we prune 97% to reach 50.9 compression. ... Using 4-bit quantization combined with 99.5% pruning, Ultra Delta achieves an impressive 224.6 compression ratio. ... With 4-bit quantization and 99% sparsity, Ultra Delta achieves a 132.5 compression ratio. ... With 4-bit quantization and 90% sparsity, Ultra Delta achieves a 18.4 compression ratio. ... We evaluate key hyper-parameters in DAC and MSA on 8 Vi T-B/32 models, as shown in Fig. 6 (detailed results are in App. C.4.2). For bit-width in quantization (see Eq. 4), accuracy improves notably from 2-bit to 4-bit, with marginal gains beyond, suggesting 4-bit as an optimal balance. For sparsity step sstep in MSA (see Eq. 3), we evaluate it under a fixed 97% target sparsity and 4-bit quantization. A moderate step size (within [0.01, 0.02]) yields the best average performance. |