Delta-CoMe: Training-Free Delta-Compression with Mixed-Precision for Large Language Models

Authors: Bowen Ping, Shuo Wang, Hanqing Wang, Xu Han, Yuzhuang Xu, Yukun Yan, Yun Chen, Baobao Chang, Zhiyuan Liu, Maosong Sun

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our approach on various fine-tuned LLMs, including math LLMs, code LLMs, chat LLMs, and even VLMs. Experimental results demonstrate that our approach performs comparably to full fine-tuned LLMs, surpassing both low-rank and low-bit baselines by a considerable margin.
Researcher Affiliation Academia Bowen Ping1 Shuo Wang2 Hanqing Wang3 Xu Han2,4,5 Yuzhuang Xu2 Yukun Yan2 Yun Chen3 Baobao Chang1 Zhiyuan Liu2,4,5 Maosong Sun2,4,5 1Peking University 2Dept. of Comp. Sci. & Tech., Tsinghua University, Beijing, China 3Shanghai University of Finance and Economics 4Institute for AI, Tsinghua University, Beijing, China 5Beijing National Research Center for Information Science and Technology
Pseudocode No The paper describes the proposed method using descriptive text and mathematical equations, but it does not include a dedicated pseudocode block or an algorithm figure.
Open Source Code Yes Code will be publicly available at https://github.com/thunlp/Delta-Co Me.
Open Datasets Yes For this task, we employ GSM8K (Cobbe et al., 2021) and MATH (Hendrycks et al., 2021) as the evaluation datasets... For this task, we use Human Eval (Chen et al., 2021) and MBPP (Austin et al., 2021) as the evaluation datasets... For evaluating chat LLMs, we select Truthful QA (Lin et al., 2022) and Safety Bench (Zhang et al., 2023) as the evaluation datasets... For this task, we use GQA (Hudson & Manning, 2019) and Text VQA (Singh et al., 2019).
Dataset Splits No The paper lists various evaluation datasets such as GSM8K, MATH, Human Eval, MBPP, Truthful QA, Safety Bench, GQA, and Text VQA. While these datasets are used for evaluation, the paper does not specify how these datasets were formally split into distinct training, validation, and test subsets for their experiments on Delta-Co Me.
Hardware Specification Yes All methods are evaluated on NVIDIA A100 GPUs.
Software Dependencies No The paper mentions implementing a 'Triton (Tillet et al., 2019) kernel' and comparing performance with 'Py Torch implementation' but does not specify exact version numbers for these or other software dependencies.
Experiment Setup Yes During inference, we use greedy search. We set the Lo RA rank to 128 and the scale factor to 16, using a cosine warmup schedule with a warmup ratio of 0.04 and a peak learning rate of 1e-4. For each task, we trained the Lo RA for 3 epochs.