FM-Delta: Lossless Compression for Storing Massive Fine-tuned Foundation Models

Authors: Wanyi Ning, Jingyu Wang, Qi Qi, Mengde Zhu, Haifeng Sun, Daixuan Cheng, Jianxin Liao, Ce Zhang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our empirical and theoretical analysis reveals that most fine-tuned models in cloud have a small difference (delta) from their pre-trained models. To this end, we propose a novel lossless compression scheme FM-Delta specifically for storing massive fine-tuned models in cloud. FM-Delta maps fine-tuned and pre-trained model parameters into integers with the same bits, and entropy codes their integer delta. In this way, cloud only needs to store one uncompressed pre-trained model and other compressed fine-tuned models. Extensive experiments have demonstrated that FM-Delta efficiently reduces cloud storage consumption for massive fine-tuned models by an average of around 50% with only negligible additional time in most end-to-end cases.
Researcher Affiliation Academia Wanyi Ning1 Jingyu Wang12 Qi Qi1 Mengde Zhu1 Haifeng Sun1 Daixuan Cheng1 Jianxin Liao1 Ce Zhang3 1 Beijing University of Posts and Telecommunications 2 Pengcheng Laboratory 3 University of Chicago {ningwanyi, wangjingyu, qiqi8266, arnoldzhu, hfsun}@bupt.edu.cn daixuancheng6@gmail.com, liaojx@bupt.edu.cn, cez@uchicago.edu
Pseudocode No The paper describes the algorithm in Section 4.1 and illustrates it in Figure 5, but does not present a formal pseudocode block or algorithm listing.
Open Source Code Yes Our code is available in https://github.com/ningwanyi/FM-Delta.
Open Datasets Yes We download four common model families for different learning tasks from the popular cloud provider Hugging Face, including Stable Diffusion(33), GPT2(34), Bert-large-uncased(35), and Res Net50(36). ... on Pokemon Stable Diffusion(37), Wikitext103 GPT2(38), SST2 BERT(39), and FER2013 Res Net50(40).
Dataset Splits No The paper does not explicitly provide training/test/validation dataset splits with percentages or sample counts for the experiments.
Hardware Specification Yes Our experiments were run on AMD Ryzen 9 5950X 16-Core Processors@2.2GHz (32 logical processors) with 251GB of main memory.
Software Dependencies No We implemented FM-Delta for models in Py Torch(50) format. Our experiments were run on AMD Ryzen 9 5950X 16-Core Processors@2.2GHz (32 logical processors) with 251GB of main memory. In our end-to-end simulation, we simulate the communication between cloud and users through Python 'socket' library. Model compression and decompression processes are parallel with network transfer through reading and writing models in chunks.
Experiment Setup No The paper mentions aspects of fine-tuning such as 'small learning rate and a limited number of steps' and tracking 'Euclidean distance' over 'Steps' (e.g., 1000 steps) for specific models and datasets, but it does not provide a comprehensive list of concrete hyperparameter values (e.g., specific learning rate value, batch size, or detailed optimizer configurations) for its experiments.