Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
dKV-Cache: The Cache for Diffusion Language Models
Authors: Xinyin Ma, Runpeng Yu, Gongfan Fang, Xinchao Wang
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 Experiments 4.1 Experimental Setup We tested our method under the original evaluation benchmark of LLa DA [37] and Dream [52]. Datasets: We conduct comprehensive evaluations across a diverse set of benchmarks that assess Table 1: Benchmark results on LLa DA-8B-Instruct. |
| Researcher Affiliation | Academia | Xinyin Ma Runpeng Yu Gongfan Fang Xinchao Wang National University of Singapore EMAIL, EMAIL |
| Pseudocode | Yes | We present the pseudo-algorithm of our approach in Algorithm 1. While it largely improves inference speed over the naive implementation, the concat and reorder operations still introduce some overhead. |
| Open Source Code | Yes | The code is available at https://github.com/horseee/d KV-Cache. |
| Open Datasets | Yes | Datasets: We conduct comprehensive evaluations across a diverse set of benchmarks that assess general language understanding [23], mathematical reasoning [10, 30, 39], and code generation [8, 4]. We tested our method under the original evaluation benchmark of LLa DA [37] and Dream [52]. |
| Dataset Splits | Yes | We tested our method under the original evaluation benchmark of LLa DA [37] and Dream [52]. Datasets: We conduct comprehensive evaluations across a diverse set of benchmarks... We follow the prompt in simple-evals2 for LLa DA, making the model reason step by step. On Dream, we follow the evaluation setting of Dream to conduct few-shot in-context learning. |
| Hardware Specification | Yes | We tested the speed on A6000 (for LLa DA) and H20 (for Dream). |
| Software Dependencies | No | The paper describes the transformer architecture and mentions models like LLa DA and Dream, but it does not specify software dependencies with version numbers (e.g., 'PyTorch 1.9'). |
| Experiment Setup | Yes | Table 1: Benchmark results on LLa DA-8B-Instruct. We use zero-shot evaluation here. Detailed configuration is listed in the Appendix. We set the cache refresh step for d KV-Cache-Decode to be 8 and d KV-Cache-Greedy to be 2. The window size of d KV-Cache-Greedy is listed in the bracket. Table 5: Configurations of experiments on LLa DA-Instruct. MMLU L=32, T=32, B=16 T=20 2 4 T=16 8 |