A Theoretical Perspective for Speculative Decoding Algorithm

Authors: Ming Yin, Minshuo Chen, Kaixuan Huang, Mengdi Wang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our analysis covers the theoretical limits of speculative decoding, batch algorithms, and output quality-inference acceleration tradeoffs. Our results reveal the fundamental connections between different components of LLMs via total variation distances and show how they jointly affect the efficiency of decoding algorithms. ... A simple experiment in Section 4.2 is also consistent with our theoretical finding.
Researcher Affiliation Academia Ming Yin Princeton University my0049@princeton.edu Minshuo Chen Northwestern University minshuo.chen@northwestern.edu Kaixuan Huang Princeton University kaixuanh@princeton.edu Mengdi Wang Princeton University mengdiw@princeton.edu
Pseudocode Yes Algorithm 1 Speculative Decoding [10, 24]
Open Source Code No To implement Decoding-UNO, we modify the speculative sampling function in Hugging Face transformers/generation/utils.py file as follows12 (where variable eps is ϵ in Table 1).
Open Datasets Yes We test 200 prompts from Alpaca-Farm-Eval Dataset [13] with 500 responses/comparisons per prompt. ... We specify draft model p as pythia-70m and target model q as pythia-2.8b from Eleuther AI [7].
Dataset Splits No The paper does not explicitly provide training/validation/test dataset splits. It mentions testing 200 prompts from Alpaca-Farm-Eval Dataset.
Hardware Specification Yes This is conducted in a single A100 GPU.
Software Dependencies No The paper mentions using Hugging Face models and PyTorch functions, but does not provide specific version numbers for these software dependencies (e.g., PyTorch version, Hugging Face transformers library version).
Experiment Setup Yes Table 1: Win Rate for Decoding-OPT vs Decoding-UNO with different over-acceptance threshold ϵ. The acceptance probability bpxq mint1, qpxq ϵ ppxq u.