Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Breaking AR’s Sampling Bottleneck: Provable Acceleration via Diffusion Language Models

Authors: Gen Li, Changxiao Cai

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we present numerical experiments to validate our convergence theory developed in Section 3... Collectively, these numerical studies confirm our main theoretical findings
Researcher Affiliation	Academia	Gen Li Chinese University of Hong Kong EMAIL Changxiao Cai University of Michigan EMAIL
Pseudocode	No	The paper describes the forward process, training, and sampling procedure using descriptive text and mathematical equations, but does not include a dedicated pseudocode or algorithm block.
Open Source Code	No	The paper does not explicitly state that source code for the described methodology is publicly available, nor does it provide any links to code repositories.
Open Datasets	No	For the data distribution pdata of text X = (X(1), . . . , X(L)), we consider a K-state Potts chain of length L with coupling parameter J. Specifically, X(1) Unif([K]) and for i 2, P{X(i) = y \| X(i 1) = x} = exp J 1{x = y} exp(J) + K 1 , x, y [K]. This construction allows us to compute explicitly the mutual information I(X(i); X( i)), the optimal mask predictor p ( \| Xt), and the distributions of both the data p X0 and the generated sample p Y0\|M.
Dataset Splits	No	The expectation in the KL divergence, taken over both the mask schedule M and the data distribution p X0, is approximated via Monte Carlo simulations.
Hardware Specification	No	The paper describes numerical experiments in Section 5 but does not specify the hardware used for these computations.
Software Dependencies	No	The paper describes numerical experiments in Section 5 but does not specify any software dependencies or their versions.
Experiment Setup	Yes	Set K = 10 and L = 100. Figure 1 (a) presents the sampling error (in KL divergence) vs. the number of iterations T. As shown, the slope in the log-log plot is very close to 1, demonstrating that the sampling error scales proportionally to 1/T. In addition, Figure 1 (b) plots the KL sampling error vs. the mutual information (controlled by J). One can see that the sampling error increases approximately linearly with the mutual information. We implement the sampling process using the optimal mask predictor p ( \| Xt) and a balanced mask schedule where the number of unmasked tokens is the same at each iteration.