Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Semi-Clairvoyant Scheduling of Speculative Decoding Requests to Minimize LLM Inference Latency
Authors: Ruixiao Li, Fahao Chen, Peng Li
IJCAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that LAPS-SD reduces inference latency by approximately 39% compared to state-of-the-art scheduling methods. We evaluate LAPS-SD using three commonly used datasets: Chatbot Instruction Prompts [Alessandro Palla, 2023], MBPP [Austin et al., 2021], and Mini Thinky [Xuan Son NGUYEN, 2024]. |
| Researcher Affiliation | Academia | 1School of Cyber Science and Engineering, Xi an Jiaotong University 2School of Computer Science and Engineering, The University of Aizu EMAIL, EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1 The Proposed LAPS-SD Scheduling Algorithm Input: Arrival speculative decoding requests; 1: Initialize priority queues; 2: Initialize requests states as non-perceptible; 3: Put all requests in the queue with highest priority; 4: Schedule requests Inter Queue Schedule(); 5: procedure INTERQUEUESCHED ULE( ) 6: for Non-empty queue with the highest queue do 7: Schedule requests Intra Queue Schedule() 8: end for 9: end procedure 10: procedure INTRAQUEUESCHED ULE( ) 11: if Request becomes stable then 12: Change request s state to perceptible; 13: Predict the acceptance rate and request length; 14: Estimate the execution time; 15: end if 16: Schedule requests with semi-clairvoyant strategy; 17: end procedure |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating the release of open-source code for the described methodology. |
| Open Datasets | Yes | We evaluate LAPS-SD using three commonly used datasets: Chatbot Instruction Prompts [Alessandro Palla, 2023], MBPP [Austin et al., 2021], and Mini Thinky [Xuan Son NGUYEN, 2024], following the setup in [Miao et al., 2024]. [...] [Alessandro Palla, 2023] Alessandro Palla. Chatbot instruction prompts. https://huggingface.co/datasets/alespalla/ chatbot instruction prompts, 2023. [...] [Xuan Son NGUYEN, 2024] Xuan Son NGUYEN. Minithinky dataset. https://huggingface.co/datasets/ngxson/ Mini Thinky-dataset, 2024. |
| Dataset Splits | No | The paper mentions using three datasets: Chatbot Instruction Prompts, MBPP, and Mini Thinky, but it does not provide specific details on how these datasets were split into training, validation, or test sets. |
| Hardware Specification | Yes | Environment. We evaluate LAPS-SD on an NVIDIA L20 GPU with 48GB memory. |
| Software Dependencies | Yes | The system runs Ubuntu 20.04.6 with Linux kernel version 5.15.0-91-generic, NVIDIA driver 550.120, CUDA 12.4, and cu DNN 8.6.0. The algorithm is implemented in Pytorch version 2.5.1. |
| Experiment Setup | Yes | We assume that the batch size of this serving system is set to 1 for clear presentation. |