Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Breaking ARβs Sampling Bottleneck: Provable Acceleration via Diffusion Language Models
Authors: Gen Li, Changxiao Cai
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we present numerical experiments to validate our convergence theory developed in Section 3... Collectively, these numerical studies confirm our main theoretical findings |
| Researcher Affiliation | Academia | Gen Li Chinese University of Hong Kong EMAIL Changxiao Cai University of Michigan EMAIL |
| Pseudocode | No | The paper describes the forward process, training, and sampling procedure using descriptive text and mathematical equations, but does not include a dedicated pseudocode or algorithm block. |
| Open Source Code | No | The paper does not explicitly state that source code for the described methodology is publicly available, nor does it provide any links to code repositories. |
| Open Datasets | No | For the data distribution pdata of text X = (X(1), . . . , X(L)), we consider a K-state Potts chain of length L with coupling parameter J. Specifically, X(1) Unif([K]) and for i 2, P{X(i) = y | X(i 1) = x} = exp J 1{x = y} exp(J) + K 1 , x, y [K]. This construction allows us to compute explicitly the mutual information I(X(i); X( i)), the optimal mask predictor p ( | Xt), and the distributions of both the data p X0 and the generated sample p Y0|M. |
| Dataset Splits | No | The expectation in the KL divergence, taken over both the mask schedule M and the data distribution p X0, is approximated via Monte Carlo simulations. |
| Hardware Specification | No | The paper describes numerical experiments in Section 5 but does not specify the hardware used for these computations. |
| Software Dependencies | No | The paper describes numerical experiments in Section 5 but does not specify any software dependencies or their versions. |
| Experiment Setup | Yes | Set K = 10 and L = 100. Figure 1 (a) presents the sampling error (in KL divergence) vs. the number of iterations T. As shown, the slope in the log-log plot is very close to 1, demonstrating that the sampling error scales proportionally to 1/T. In addition, Figure 1 (b) plots the KL sampling error vs. the mutual information (controlled by J). One can see that the sampling error increases approximately linearly with the mutual information. We implement the sampling process using the optimal mask predictor p ( | Xt) and a balanced mask schedule where the number of unmasked tokens is the same at each iteration. |