Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Online Speculative Decoding

Authors: Xiaoxuan Liu, Lanxiang Hu, Peter Bailis, Alvin Cheung, Zhijie Deng, Ion Stoica, Hao Zhang

ICML 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We develop a prototype of online speculative decoding based on knowledge distillation and evaluate it using both synthetic and real query data. The results show a substantial increase in the token acceptance rate by 0.1 to 0.65, bringing 1.42 to 2.17 latency reduction.
Researcher Affiliation Collaboration 1UC Berkeley 2UCSD 3Google Inc. 4SJTU. Correspondence to: Hao, Zhang <EMAIL>, Zhijie, Deng <EMAIL>.
Pseudocode Yes Algorithm 1 Online Speculative Decoding.
Open Source Code Yes Our code is available at https: //github.com/Liu Xiaoxuan PKU/OSD.
Open Datasets Yes We evaluate performance across four diverse datasets: Text-to-SQL (Spider) (Yu et al., 2018), graduate school math (Gsm8k) (Cobbe et al., 2021), Python code generation (Code-search-Python) (Husain et al., 2019), and financial question answering (Alpaca-finance) (Bharti, 2023).
Dataset Splits No The paper mentions training and test sets but does not explicitly describe a separate validation set or its split ratios for hyperparameter tuning or model selection.
Hardware Specification Yes We conduct the experiments with llamacpp (Gerganov, 2023) on a single A100-80G.
Software Dependencies No The paper mentions "llamacpp (Gerganov, 2023)" and "Huggingface Transformer library (hft, 2023)" but does not provide specific version numbers for these software dependencies or other programming languages/libraries used.
Experiment Setup Yes In all experiments, we set the number of proposed tokens to 5 for speculative decoding. For all online experiments, we fix the update interval I at 8.