A Theoretical Perspective for Speculative Decoding Algorithm
Authors: Ming Yin, Minshuo Chen, Kaixuan Huang, Mengdi Wang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our analysis covers the theoretical limits of speculative decoding, batch algorithms, and output quality-inference acceleration tradeoffs. Our results reveal the fundamental connections between different components of LLMs via total variation distances and show how they jointly affect the efficiency of decoding algorithms. ... A simple experiment in Section 4.2 is also consistent with our theoretical finding. |
| Researcher Affiliation | Academia | Ming Yin Princeton University my0049@princeton.edu Minshuo Chen Northwestern University minshuo.chen@northwestern.edu Kaixuan Huang Princeton University kaixuanh@princeton.edu Mengdi Wang Princeton University mengdiw@princeton.edu |
| Pseudocode | Yes | Algorithm 1 Speculative Decoding [10, 24] |
| Open Source Code | No | To implement Decoding-UNO, we modify the speculative sampling function in Hugging Face transformers/generation/utils.py file as follows12 (where variable eps is ϵ in Table 1). |
| Open Datasets | Yes | We test 200 prompts from Alpaca-Farm-Eval Dataset [13] with 500 responses/comparisons per prompt. ... We specify draft model p as pythia-70m and target model q as pythia-2.8b from Eleuther AI [7]. |
| Dataset Splits | No | The paper does not explicitly provide training/validation/test dataset splits. It mentions testing 200 prompts from Alpaca-Farm-Eval Dataset. |
| Hardware Specification | Yes | This is conducted in a single A100 GPU. |
| Software Dependencies | No | The paper mentions using Hugging Face models and PyTorch functions, but does not provide specific version numbers for these software dependencies (e.g., PyTorch version, Hugging Face transformers library version). |
| Experiment Setup | Yes | Table 1: Win Rate for Decoding-OPT vs Decoding-UNO with different over-acceptance threshold ϵ. The acceptance probability bpxq mint1, qpxq ϵ ppxq u. |