Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Accelerating Diffusion LLMs via Adaptive Parallel Decoding

Authors: Daniel Israel, Guy Van den Broeck, Aditya Grover

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present empirical evidence showing that APD achieves substantially higher throughput compared to existing LLM decoding strategies, all while incurring only minimal degradations in quality across a range of downstream benchmark tasks. The subsequent sections will discuss the technical details of APD, present a comprehensive set of experiments validating our claims, and discuss the broader implications of our findings for the future of efficient LLM generation. 4 Experiments
Researcher Affiliation Academia Daniel Israel Department of Computer Science University of California, Los Angeles EMAIL Guy Van den Broeck Department of Computer Science University of California, Los Angeles EMAIL Aditya Grover Department of Computer Science University of California, Los Angeles EMAIL
Pseudocode Yes Algorithm 1 Adaptive Parallel Decoding 1: Input: Diffusion model p D, Autoregressive model p AR, Mixture Weight Parameter R, Maximum sequence length n 2: Output: Generated token sequence x 3: x () Stores the accepted tokens 4: t 1 Index of token to generate 5: while t n do 6: marginal_logitst:n p D(xt:n | x<t) 7: r Gumbel(0, 1) 8: xt:n sample_gumbel(marginal_logitst:n, r) 9: joint_logitst:n p AR(xt:n | x<t) 10: product_logitst:n softmax(R marginal_logitst:n+(1 R) joint_logitst:n) 11: yt:n sample_gumbel(product_logitst:n, r) 12: k sum(cumprod(xt+1:n = yt+1:n)) + 1 13: x concat(x, xt:t+k 1) Append accepted tokens 14: t t + k 15: end while 16: return x
Open Source Code Yes The code to reproduce the method and experiments is available publicly at https://github.com/danielmisrael/apd
Open Datasets Yes We evaluate on GSM8K [6], GPQA [33], and MATH [15], and Human Eval [4].
Dataset Splits Yes We operate using the LM Evaluation Harness [11] standard implementation of benchmarks with a few modifications and evaluate on GSM8K [6], GPQA [33], and MATH [15], and Human Eval [4]. Each plot shows the Grade School Math 8K (GSM8K) [6] accuracy with 500 samples
Hardware Specification Yes For the following experiments, we load the models in BF16 precision and run them on single NVIDIA 24GB A5000 GPU connected to a Colfax CX41060s-EK9 4U Rackmount Server with AMD EPYC (Genoa) 9124 processors.
Software Dependencies No We operate using the LM Evaluation Harness [11] standard implementation of benchmarks with a few modifications
Experiment Setup Yes When sampling from Dream 7B, we use the hyperparameters of temperature 0.2 and top-p 0.95, as these are set as default. Also, we set a maximum generation length of 256 or 512 tokens for the diffusion models and 16384 (the maximum context length) for the autoregressive Qwen models.