reproducibilityindex.ai

Kangaroo: Lossless Self-Speculative Decoding for Accelerating LLMs via Double Early Exiting

Authors: Fangcheng Liu, Yehui Tang, Zhenhua Liu, Yunsheng Ni, Duyu Tang, Kai Han, Yunhe Wang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on multiple benchmarks demonstrate our effectiveness, where Kangaroo achieves walltime speedups up to 2.04 , outperforming Medusa-1 with 88.7% fewer additional parameters.
Researcher Affiliation	Industry	Huawei Noah s Ark Lab Consumer Business Group, Huawei {liufangcheng3,yehui.tang,kai.han,yunhe.wang}@huawei.com
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	The code for Kangaroo is available at https://github.com/Equationliu/Kangaroo.
Open Datasets	Yes	We conduct experiments on Vicuna [12] models with size of 7B and 13B. ... For Kangaroo, we train the adapter A for 10 epochs with the Adam W [42] optimizer on the Share GPT dataset following Medusa [20].
Dataset Splits	Yes	We evaluate the acceleration performance with the recently proposed Spec-Bench [22], which consists of six subtasks including Multi-turn Conversation, Translation, Summarization, Question Answering, Mathematical Reasoning and Retrieval-augmented Generation.
Hardware Specification	Yes	The training of the adapter A for Vicuna-7B takes around 24 hours on 8 NVIDIA V100 GPUs.
Software Dependencies	No	The paper mentions using the Adam W [42] optimizer but does not provide specific version numbers for other software dependencies like programming languages or libraries.
Experiment Setup	Yes	For Kangaroo, we train the adapter A for 10 epochs with the Adam W [42] optimizer on the Share GPT dataset following Medusa [20]. ... During the inference stage, we set ℓ= 2 for Vicuna-7B and ℓ= 3 for Vicuna-13B. For the single-sequence decoding in Kangaroo, we set γ = 6 and η = 0.6. For the dynamic tree decoding scenario, we set Top-K as 10, and η = 0.4.