Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Semi-Clairvoyant Scheduling of Speculative Decoding Requests to Minimize LLM Inference Latency
Authors: Ruixiao Li, Fahao Chen, Peng Li
IJCAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that LAPS-SD reduces inference latency by approximately 39% compared to state-of-the-art scheduling methods. We evaluate LAPS-SD using three commonly used datasets: Chatbot Instruction Prompts [Alessandro Palla, 2023], MBPP [Austin et al., 2021], and Mini Thinky [Xuan Son NGUYEN, 2024]. |
| Researcher Affiliation | Academia | 1School of Cyber Science and Engineering, Xi an Jiaotong University 2School of Computer Science and Engineering, The University of Aizu EMAIL, EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1 The Proposed LAPS-SD Scheduling Algorithm Input: Arrival speculative decoding requests; 1: Initialize priority queues; 2: Initialize requests states as non-perceptible; 3: Put all requests in the queue with highest priority; 4: Schedule requests Inter Queue Schedule(); 5: procedure INTERQUEUESCHED ULE( ) 6: for Non-empty queue with the highest queue do 7: Schedule requests Intra Queue Schedule() 8: end for 9: end procedure 10: procedure INTRAQUEUESCHED ULE( ) 11: if Request becomes stable then 12: Change request s state to perceptible; 13: Predict the acceptance rate and request length; 14: Estimate the execution time; 15: end if 16: Schedule requests with semi-clairvoyant strategy; 17: end procedure |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating the release of open-source code for the described methodology. |
| Open Datasets | Yes | We evaluate LAPS-SD using three commonly used datasets: Chatbot Instruction Prompts [Alessandro Palla, 2023], MBPP [Austin et al., 2021], and Mini Thinky [Xuan Son NGUYEN, 2024], following the setup in [Miao et al., 2024]. [...] [Alessandro Palla, 2023] Alessandro Palla. Chatbot instruction prompts. https://huggingface.co/datasets/alespalla/ chatbot instruction prompts, 2023. [...] [Xuan Son NGUYEN, 2024] Xuan Son NGUYEN. Minithinky dataset. https://huggingface.co/datasets/ngxson/ Mini Thinky-dataset, 2024. |
| Dataset Splits | No | The paper mentions using three datasets: Chatbot Instruction Prompts, MBPP, and Mini Thinky, but it does not provide specific details on how these datasets were split into training, validation, or test sets. |
| Hardware Specification | Yes | Environment. We evaluate LAPS-SD on an NVIDIA L20 GPU with 48GB memory. |
| Software Dependencies | Yes | The system runs Ubuntu 20.04.6 with Linux kernel version 5.15.0-91-generic, NVIDIA driver 550.120, CUDA 12.4, and cu DNN 8.6.0. The algorithm is implemented in Pytorch version 2.5.1. |
| Experiment Setup | Yes | We assume that the batch size of this serving system is set to 1 for clear presentation. |