reproducibilityindex.ai

Delta-CoMe: Training-Free Delta-Compression with Mixed-Precision for Large Language Models

Authors: Bowen Ping, Shuo Wang, Hanqing Wang, Xu Han, Yuzhuang Xu, Yukun Yan, Yun Chen, Baobao Chang, Zhiyuan Liu, Maosong Sun

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our approach on various fine-tuned LLMs, including math LLMs, code LLMs, chat LLMs, and even VLMs. Experimental results demonstrate that our approach performs comparably to full fine-tuned LLMs, surpassing both low-rank and low-bit baselines by a considerable margin.
Researcher Affiliation	Academia	Bowen Ping1 Shuo Wang2 Hanqing Wang3 Xu Han2,4,5 Yuzhuang Xu2 Yukun Yan2 Yun Chen3 Baobao Chang1 Zhiyuan Liu2,4,5 Maosong Sun2,4,5 1Peking University 2Dept. of Comp. Sci. & Tech., Tsinghua University, Beijing, China 3Shanghai University of Finance and Economics 4Institute for AI, Tsinghua University, Beijing, China 5Beijing National Research Center for Information Science and Technology
Pseudocode	No	The paper describes the proposed method using descriptive text and mathematical equations, but it does not include a dedicated pseudocode block or an algorithm figure.
Open Source Code	Yes	Code will be publicly available at https://github.com/thunlp/Delta-Co Me.
Open Datasets	Yes	For this task, we employ GSM8K (Cobbe et al., 2021) and MATH (Hendrycks et al., 2021) as the evaluation datasets... For this task, we use Human Eval (Chen et al., 2021) and MBPP (Austin et al., 2021) as the evaluation datasets... For evaluating chat LLMs, we select Truthful QA (Lin et al., 2022) and Safety Bench (Zhang et al., 2023) as the evaluation datasets... For this task, we use GQA (Hudson & Manning, 2019) and Text VQA (Singh et al., 2019).
Dataset Splits	No	The paper lists various evaluation datasets such as GSM8K, MATH, Human Eval, MBPP, Truthful QA, Safety Bench, GQA, and Text VQA. While these datasets are used for evaluation, the paper does not specify how these datasets were formally split into distinct training, validation, and test subsets for their experiments on Delta-Co Me.
Hardware Specification	Yes	All methods are evaluated on NVIDIA A100 GPUs.
Software Dependencies	No	The paper mentions implementing a 'Triton (Tillet et al., 2019) kernel' and comparing performance with 'Py Torch implementation' but does not specify exact version numbers for these or other software dependencies.
Experiment Setup	Yes	During inference, we use greedy search. We set the Lo RA rank to 128 and the scale factor to 16, using a cosine warmup schedule with a warmup ratio of 0.04 and a peak learning rate of 1e-4. For each task, we trained the Lo RA for 3 epochs.