Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Accelerating Diffusion LLMs via Adaptive Parallel Decoding
Authors: Daniel Israel, Guy Van den Broeck, Aditya Grover
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present empirical evidence showing that APD achieves substantially higher throughput compared to existing LLM decoding strategies, all while incurring only minimal degradations in quality across a range of downstream benchmark tasks. The subsequent sections will discuss the technical details of APD, present a comprehensive set of experiments validating our claims, and discuss the broader implications of our findings for the future of efficient LLM generation. 4 Experiments |
| Researcher Affiliation | Academia | Daniel Israel Department of Computer Science University of California, Los Angeles EMAIL Guy Van den Broeck Department of Computer Science University of California, Los Angeles EMAIL Aditya Grover Department of Computer Science University of California, Los Angeles EMAIL |
| Pseudocode | Yes | Algorithm 1 Adaptive Parallel Decoding 1: Input: Diffusion model p D, Autoregressive model p AR, Mixture Weight Parameter R, Maximum sequence length n 2: Output: Generated token sequence x 3: x () Stores the accepted tokens 4: t 1 Index of token to generate 5: while t n do 6: marginal_logitst:n p D(xt:n | x<t) 7: r Gumbel(0, 1) 8: xt:n sample_gumbel(marginal_logitst:n, r) 9: joint_logitst:n p AR(xt:n | x<t) 10: product_logitst:n softmax(R marginal_logitst:n+(1 R) joint_logitst:n) 11: yt:n sample_gumbel(product_logitst:n, r) 12: k sum(cumprod(xt+1:n = yt+1:n)) + 1 13: x concat(x, xt:t+k 1) Append accepted tokens 14: t t + k 15: end while 16: return x |
| Open Source Code | Yes | The code to reproduce the method and experiments is available publicly at https://github.com/danielmisrael/apd |
| Open Datasets | Yes | We evaluate on GSM8K [6], GPQA [33], and MATH [15], and Human Eval [4]. |
| Dataset Splits | Yes | We operate using the LM Evaluation Harness [11] standard implementation of benchmarks with a few modifications and evaluate on GSM8K [6], GPQA [33], and MATH [15], and Human Eval [4]. Each plot shows the Grade School Math 8K (GSM8K) [6] accuracy with 500 samples |
| Hardware Specification | Yes | For the following experiments, we load the models in BF16 precision and run them on single NVIDIA 24GB A5000 GPU connected to a Colfax CX41060s-EK9 4U Rackmount Server with AMD EPYC (Genoa) 9124 processors. |
| Software Dependencies | No | We operate using the LM Evaluation Harness [11] standard implementation of benchmarks with a few modifications |
| Experiment Setup | Yes | When sampling from Dream 7B, we use the hyperparameters of temperature 0.2 and top-p 0.95, as these are set as default. Also, we set a maximum generation length of 256 or 512 tokens for the diffusion models and 16384 (the maximum context length) for the autoregressive Qwen models. |