reproducibilityindex.ai

A Theoretical Perspective for Speculative Decoding Algorithm

Authors: Ming Yin, Minshuo Chen, Kaixuan Huang, Mengdi Wang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our analysis covers the theoretical limits of speculative decoding, batch algorithms, and output quality-inference acceleration tradeoffs. Our results reveal the fundamental connections between different components of LLMs via total variation distances and show how they jointly affect the efficiency of decoding algorithms. ... A simple experiment in Section 4.2 is also consistent with our theoretical finding.
Researcher Affiliation	Academia	Ming Yin Princeton University my0049@princeton.edu Minshuo Chen Northwestern University minshuo.chen@northwestern.edu Kaixuan Huang Princeton University kaixuanh@princeton.edu Mengdi Wang Princeton University mengdiw@princeton.edu
Pseudocode	Yes	Algorithm 1 Speculative Decoding [10, 24]
Open Source Code	No	To implement Decoding-UNO, we modify the speculative sampling function in Hugging Face transformers/generation/utils.py file as follows12 (where variable eps is ϵ in Table 1).
Open Datasets	Yes	We test 200 prompts from Alpaca-Farm-Eval Dataset [13] with 500 responses/comparisons per prompt. ... We specify draft model p as pythia-70m and target model q as pythia-2.8b from Eleuther AI [7].
Dataset Splits	No	The paper does not explicitly provide training/validation/test dataset splits. It mentions testing 200 prompts from Alpaca-Farm-Eval Dataset.
Hardware Specification	Yes	This is conducted in a single A100 GPU.
Software Dependencies	No	The paper mentions using Hugging Face models and PyTorch functions, but does not provide specific version numbers for these software dependencies (e.g., PyTorch version, Hugging Face transformers library version).
Experiment Setup	Yes	Table 1: Win Rate for Decoding-OPT vs Decoding-UNO with different over-acceptance threshold ϵ. The acceptance probability bpxq mint1, qpxq ϵ ppxq u.