Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Semi-Clairvoyant Scheduling of Speculative Decoding Requests to Minimize LLM Inference Latency

Authors: Ruixiao Li, Fahao Chen, Peng Li

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments show that LAPS-SD reduces inference latency by approximately 39% compared to state-of-the-art scheduling methods. We evaluate LAPS-SD using three commonly used datasets: Chatbot Instruction Prompts [Alessandro Palla, 2023], MBPP [Austin et al., 2021], and Mini Thinky [Xuan Son NGUYEN, 2024].
Researcher Affiliation	Academia	1School of Cyber Science and Engineering, Xi an Jiaotong University 2School of Computer Science and Engineering, The University of Aizu EMAIL, EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1 The Proposed LAPS-SD Scheduling Algorithm Input: Arrival speculative decoding requests; 1: Initialize priority queues; 2: Initialize requests states as non-perceptible; 3: Put all requests in the queue with highest priority; 4: Schedule requests Inter Queue Schedule(); 5: procedure INTERQUEUESCHED ULE( ) 6: for Non-empty queue with the highest queue do 7: Schedule requests Intra Queue Schedule() 8: end for 9: end procedure 10: procedure INTRAQUEUESCHED ULE( ) 11: if Request becomes stable then 12: Change request s state to perceptible; 13: Predict the acceptance rate and request length; 14: Estimate the execution time; 15: end if 16: Schedule requests with semi-clairvoyant strategy; 17: end procedure
Open Source Code	No	The paper does not contain any explicit statements or links indicating the release of open-source code for the described methodology.
Open Datasets	Yes	We evaluate LAPS-SD using three commonly used datasets: Chatbot Instruction Prompts [Alessandro Palla, 2023], MBPP [Austin et al., 2021], and Mini Thinky [Xuan Son NGUYEN, 2024], following the setup in [Miao et al., 2024]. [...] [Alessandro Palla, 2023] Alessandro Palla. Chatbot instruction prompts. https://huggingface.co/datasets/alespalla/ chatbot instruction prompts, 2023. [...] [Xuan Son NGUYEN, 2024] Xuan Son NGUYEN. Minithinky dataset. https://huggingface.co/datasets/ngxson/ Mini Thinky-dataset, 2024.
Dataset Splits	No	The paper mentions using three datasets: Chatbot Instruction Prompts, MBPP, and Mini Thinky, but it does not provide specific details on how these datasets were split into training, validation, or test sets.
Hardware Specification	Yes	Environment. We evaluate LAPS-SD on an NVIDIA L20 GPU with 48GB memory.
Software Dependencies	Yes	The system runs Ubuntu 20.04.6 with Linux kernel version 5.15.0-91-generic, NVIDIA driver 550.120, CUDA 12.4, and cu DNN 8.6.0. The algorithm is implemented in Pytorch version 2.5.1.
Experiment Setup	Yes	We assume that the batch size of this serving system is set to 1 for clear presentation.