Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

dKV-Cache: The Cache for Diffusion Language Models

Authors: Xinyin Ma, Runpeng Yu, Gongfan Fang, Xinchao Wang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	4 Experiments 4.1 Experimental Setup We tested our method under the original evaluation benchmark of LLa DA [37] and Dream [52]. Datasets: We conduct comprehensive evaluations across a diverse set of benchmarks that assess Table 1: Benchmark results on LLa DA-8B-Instruct.
Researcher Affiliation	Academia	Xinyin Ma Runpeng Yu Gongfan Fang Xinchao Wang National University of Singapore EMAIL, EMAIL
Pseudocode	Yes	We present the pseudo-algorithm of our approach in Algorithm 1. While it largely improves inference speed over the naive implementation, the concat and reorder operations still introduce some overhead.
Open Source Code	Yes	The code is available at https://github.com/horseee/d KV-Cache.
Open Datasets	Yes	Datasets: We conduct comprehensive evaluations across a diverse set of benchmarks that assess general language understanding [23], mathematical reasoning [10, 30, 39], and code generation [8, 4]. We tested our method under the original evaluation benchmark of LLa DA [37] and Dream [52].
Dataset Splits	Yes	We tested our method under the original evaluation benchmark of LLa DA [37] and Dream [52]. Datasets: We conduct comprehensive evaluations across a diverse set of benchmarks... We follow the prompt in simple-evals2 for LLa DA, making the model reason step by step. On Dream, we follow the evaluation setting of Dream to conduct few-shot in-context learning.
Hardware Specification	Yes	We tested the speed on A6000 (for LLa DA) and H20 (for Dream).
Software Dependencies	No	The paper describes the transformer architecture and mentions models like LLa DA and Dream, but it does not specify software dependencies with version numbers (e.g., 'PyTorch 1.9').
Experiment Setup	Yes	Table 1: Benchmark results on LLa DA-8B-Instruct. We use zero-shot evaluation here. Detailed configuration is listed in the Appendix. We set the cache refresh step for d KV-Cache-Decode to be 8 and d KV-Cache-Greedy to be 2. The window size of d KV-Cache-Greedy is listed in the bracket. Table 5: Configurations of experiments on LLa DA-Instruct. MMLU L=32, T=32, B=16 T=20 2 4 T=16 8