Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Dynamic-Width Speculative Beam Decoding for LLM Inference

Authors: Zongyue Qin, Zifan He, Neha Prakriya, Jason Cong, Yizhou Sun

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results show that our approach achieves a 1.5-1.9 speed-up and 1.8-2.5 smaller energy consumption than beam sampling, without sacrificing performance on downstream tasks. Besides, it can produce significantly higher-quality outputs than speculative decoding, while maintaining comparable time, memory, and energy costs.
Researcher Affiliation	Academia	University of California Los Angeles, CA, USA EMAIL
Pseudocode	Yes	Algorithm 1: Draft and Verification for Speculative Beam Sampling
Open Source Code	Yes	Our code is open source1. 1https://github.com/Zongyue Qin/DSBD
Open Datasets	Yes	We use public datasets: SQu AD (Rajpurkar, Jia, and Liang 2018), Spider (Yu et al. 2018), and MTBench (Zheng et al. 2023).
Dataset Splits	No	The paper uses public datasets: SQu AD (Rajpurkar, Jia, and Liang 2018), Spider (Yu et al. 2018), and MTBench (Zheng et al. 2023). However, it does not explicitly provide details about training/test/validation dataset splits, percentages, or methodology for these datasets in the main text.
Hardware Specification	No	We use Llama-2-13B, Llama-3.1-8B, and OPT-13B as the large models as they are the largest models our GPU could run.
Software Dependencies	No	The paper mentions various LLM architectures and models like transformer (Vaswani et al. 2017), GPT-4 (Achiam et al. 2023), Llama-3 (AI@Meta 2024), PALM (Anil et al. 2023), OPT (Zhang et al. 2022), Llama-2 (Touvron et al. 2023), and Llama-68M (Miao et al. 2023). However, it does not provide specific version numbers for software dependencies such as programming languages, deep learning frameworks (e.g., PyTorch, TensorFlow), or other relevant libraries.
Experiment Setup	Yes	The width of beam sampling ranges from 1 to 4. For our method, we vary the draft beam width WS {2, 3, 4, 5, 6}, the threshold t {0.7, 0.9}, and set Wmin {1, 2, 3}.