Delta-CoMe: Training-Free Delta-Compression with Mixed-Precision for Large Language Models
Authors: Bowen Ping, Shuo Wang, Hanqing Wang, Xu Han, Yuzhuang Xu, Yukun Yan, Yun Chen, Baobao Chang, Zhiyuan Liu, Maosong Sun
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our approach on various fine-tuned LLMs, including math LLMs, code LLMs, chat LLMs, and even VLMs. Experimental results demonstrate that our approach performs comparably to full fine-tuned LLMs, surpassing both low-rank and low-bit baselines by a considerable margin. |
| Researcher Affiliation | Academia | Bowen Ping1 Shuo Wang2 Hanqing Wang3 Xu Han2,4,5 Yuzhuang Xu2 Yukun Yan2 Yun Chen3 Baobao Chang1 Zhiyuan Liu2,4,5 Maosong Sun2,4,5 1Peking University 2Dept. of Comp. Sci. & Tech., Tsinghua University, Beijing, China 3Shanghai University of Finance and Economics 4Institute for AI, Tsinghua University, Beijing, China 5Beijing National Research Center for Information Science and Technology |
| Pseudocode | No | The paper describes the proposed method using descriptive text and mathematical equations, but it does not include a dedicated pseudocode block or an algorithm figure. |
| Open Source Code | Yes | Code will be publicly available at https://github.com/thunlp/Delta-Co Me. |
| Open Datasets | Yes | For this task, we employ GSM8K (Cobbe et al., 2021) and MATH (Hendrycks et al., 2021) as the evaluation datasets... For this task, we use Human Eval (Chen et al., 2021) and MBPP (Austin et al., 2021) as the evaluation datasets... For evaluating chat LLMs, we select Truthful QA (Lin et al., 2022) and Safety Bench (Zhang et al., 2023) as the evaluation datasets... For this task, we use GQA (Hudson & Manning, 2019) and Text VQA (Singh et al., 2019). |
| Dataset Splits | No | The paper lists various evaluation datasets such as GSM8K, MATH, Human Eval, MBPP, Truthful QA, Safety Bench, GQA, and Text VQA. While these datasets are used for evaluation, the paper does not specify how these datasets were formally split into distinct training, validation, and test subsets for their experiments on Delta-Co Me. |
| Hardware Specification | Yes | All methods are evaluated on NVIDIA A100 GPUs. |
| Software Dependencies | No | The paper mentions implementing a 'Triton (Tillet et al., 2019) kernel' and comparing performance with 'Py Torch implementation' but does not specify exact version numbers for these or other software dependencies. |
| Experiment Setup | Yes | During inference, we use greedy search. We set the Lo RA rank to 128 and the scale factor to 16, using a cosine warmup schedule with a warmup ratio of 0.04 and a peak learning rate of 1e-4. For each task, we trained the Lo RA for 3 epochs. |