Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Online Speculative Decoding
Authors: Xiaoxuan Liu, Lanxiang Hu, Peter Bailis, Alvin Cheung, Zhijie Deng, Ion Stoica, Hao Zhang
ICML 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We develop a prototype of online speculative decoding based on knowledge distillation and evaluate it using both synthetic and real query data. The results show a substantial increase in the token acceptance rate by 0.1 to 0.65, bringing 1.42 to 2.17 latency reduction. |
| Researcher Affiliation | Collaboration | 1UC Berkeley 2UCSD 3Google Inc. 4SJTU. Correspondence to: Hao, Zhang <EMAIL>, Zhijie, Deng <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Online Speculative Decoding. |
| Open Source Code | Yes | Our code is available at https: //github.com/Liu Xiaoxuan PKU/OSD. |
| Open Datasets | Yes | We evaluate performance across four diverse datasets: Text-to-SQL (Spider) (Yu et al., 2018), graduate school math (Gsm8k) (Cobbe et al., 2021), Python code generation (Code-search-Python) (Husain et al., 2019), and financial question answering (Alpaca-finance) (Bharti, 2023). |
| Dataset Splits | No | The paper mentions training and test sets but does not explicitly describe a separate validation set or its split ratios for hyperparameter tuning or model selection. |
| Hardware Specification | Yes | We conduct the experiments with llamacpp (Gerganov, 2023) on a single A100-80G. |
| Software Dependencies | No | The paper mentions "llamacpp (Gerganov, 2023)" and "Huggingface Transformer library (hft, 2023)" but does not provide specific version numbers for these software dependencies or other programming languages/libraries used. |
| Experiment Setup | Yes | In all experiments, we set the number of proposed tokens to 5 for speculative decoding. For all online experiments, we fix the update interval I at 8. |