Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
EasySpec: Layer-Parallel Speculative Decoding for Efficient Multi-GPU Utilization
Authors: Yize Wu, KE GAO, Ling Li, Yanjun Wu
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluated Easy Spec on several mainstream open-source LLMs, using smaller versions of models from the same series as drafters. The results demonstrate that Easy Spec can achieve a peak speedup of 4.17x compared to vanilla decoding, while preserving the original distributions of the base LLMs. Specifically, the drafting stage can be accelerated by up to 1.62x with a maximum speculation accuracy drop of only 7%. |
| Researcher Affiliation | Academia | Yize Wu1,2 Ke Gao1 Ling Li1,2 Yanjun Wu1 1Intelligent Software Research Center, Institute of Software, CAS, Beijing, China 2University of Chinese Academy of Sciences, Beijing, China EMAIL |
| Pseudocode | Yes | Algorithm 1 Layer-Sequential Speculation Input: hidden state h, N consecutive attention layers Attn1, ,Attn N and MLP layers MLP1, ,MLPN h1 = h for i = 1 to N do Attnoutputi = Attni(hi) (sequential) h i = hi + Attnoutputi MLPoutputi = MLPi(h i) hi+1 = h i + MLPoutputi end for |
| Open Source Code | Yes | The code is available at https://github.com/Yize-Wu/Easy Spec. |
| Open Datasets | Yes | The benchmarks include a variety of tasks: language understanding (MMLU[20]), code generation (Human Eval[21]), math reasoning (MATH[22]), instruction following (IFEval[23]) and multilingual language usage (MGSM[24]). We also evaluated our method on Spec-Bench [25] for comparison with related work. |
| Dataset Splits | No | The paper mentions evaluating on standard benchmarks (MMLU, Human Eval, MATH, IFEval, MGSM, Spec-Bench) but does not explicitly detail the training/test/validation splits used for its experiments, nor does it provide specific percentages, sample counts, or explicit references to how these benchmarks' data were partitioned for their evaluation. |
| Hardware Specification | Yes | The experiments were conducted on 8 A100 GPUs. |
| Software Dependencies | No | The paper does not explicitly list specific software dependencies with version numbers, such as programming languages, libraries, or frameworks used for the implementation of Easy Spec. |
| Experiment Setup | Yes | Unless specified otherwise, the layer-parallel size N is set to 4, as it is optimal for most of the tested models. The tensor-parallel sizes for drafter and base models are 1 and 8, as they are the optimal in most cases (see Table 6). The optimal and applied speculation length is 5. All the experiments were conducted using chain-of-thought reasoning with a maximum of 128 tokens. |