Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Dynamic-Width Speculative Beam Decoding for LLM Inference
Authors: Zongyue Qin, Zifan He, Neha Prakriya, Jason Cong, Yizhou Sun
AAAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that our approach achieves a 1.5-1.9 speed-up and 1.8-2.5 smaller energy consumption than beam sampling, without sacrificing performance on downstream tasks. Besides, it can produce significantly higher-quality outputs than speculative decoding, while maintaining comparable time, memory, and energy costs. |
| Researcher Affiliation | Academia | University of California Los Angeles, CA, USA EMAIL |
| Pseudocode | Yes | Algorithm 1: Draft and Verification for Speculative Beam Sampling |
| Open Source Code | Yes | Our code is open source1. 1https://github.com/Zongyue Qin/DSBD |
| Open Datasets | Yes | We use public datasets: SQu AD (Rajpurkar, Jia, and Liang 2018), Spider (Yu et al. 2018), and MTBench (Zheng et al. 2023). |
| Dataset Splits | No | The paper uses public datasets: SQu AD (Rajpurkar, Jia, and Liang 2018), Spider (Yu et al. 2018), and MTBench (Zheng et al. 2023). However, it does not explicitly provide details about training/test/validation dataset splits, percentages, or methodology for these datasets in the main text. |
| Hardware Specification | No | We use Llama-2-13B, Llama-3.1-8B, and OPT-13B as the large models as they are the largest models our GPU could run. |
| Software Dependencies | No | The paper mentions various LLM architectures and models like transformer (Vaswani et al. 2017), GPT-4 (Achiam et al. 2023), Llama-3 (AI@Meta 2024), PALM (Anil et al. 2023), OPT (Zhang et al. 2022), Llama-2 (Touvron et al. 2023), and Llama-68M (Miao et al. 2023). However, it does not provide specific version numbers for software dependencies such as programming languages, deep learning frameworks (e.g., PyTorch, TensorFlow), or other relevant libraries. |
| Experiment Setup | Yes | The width of beam sampling ranges from 1 to 4. For our method, we vary the draft beam width WS {2, 3, 4, 5, 6}, the threshold t {0.7, 0.9}, and set Wmin {1, 2, 3}. |