Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Beyond Masked and Unmasked: Discrete Diffusion Models via Partial Masking

Authors: Chen-Hao (Lance) Chao, Wei-Fang Sun, Hanwen Liang, Chun-Yi Lee, Rahul G Krishnan

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This section presents empirical evaluations to examine the effectiveness of the proposed method. We first report results in the text generation domain in Section 4.1. Then, we provide comparisons on the image generation benchmarks in Section 4.2. On the Open Web Text dataset [11], MDM-Prime attains an evaluation perplexity of 15.36, outperforming ARM (17.54), MDM [7, 9, 10, 12] (21.52), and their hybrid variants [12, 13] (17.58).
Researcher Affiliation	Collaboration	Chen-Hao Chao1, Wei-Fang Sun2, Hanwen Liang1, Chun-Yi Lee3, Rahul G. Krishnan1 1 University of Toronto, Vector Institute 2 NVIDIA AI Technology Center 3 National Taiwan University
Pseudocode	No	The paper describes methods and procedures in narrative text and mathematical equations, but does not include any clearly labeled pseudocode or algorithm blocks. For example, Section 3.1 describes 'Discrete Diffusion via Partial Masking' textually.
Open Source Code	Yes	All the data used in this paper are publicly available. The code and installation instructions are available in an anonymous repository, with the link provided in Appendix A.4.
Open Datasets	Yes	On the Open Web Text dataset [11], MDM-Prime attains an evaluation perplexity of 15.36... On CIFAR-10 [15] and Image Net-32 [16] datasets.
Dataset Splits	Yes	As OWT does not include an official validation split, we follow the procedure in [9] by reserving the last 100, 000 samples for validation. The CIFAR-10 training set contains 50, 000 images, while the Image Net-32 dataset comprises 1, 281, 149 training images and 49, 999 validation images.
Hardware Specification	Yes	The training is performed on a single NVIDIA A40 GPU with 48 GB memory. The training is performed on eight NVIDIA L40 GPUs with 48 GB memory.
Software Dependencies	No	The paper mentions software libraries like 'torchmetrics.image.fid' and 'torchmetrics.image.inception' for evaluation, and 'Adam optimizer' and 'AdamW optimizer' for training, but does not provide specific version numbers for any of these software components or underlying frameworks (e.g., Python, PyTorch).
Experiment Setup	Yes	The models are trained using the Adam optimizer [62] with a learning rate of 1 10 3 and a batch size of 4, 096. The network is optimized using the Adam optimizer [62] with β1 = 0.9, β2 = 0.999, and a learning rate of 1 10 4. The scheduling function is defined as a third-order polynomial, given by αt = (1 t)3. ... The model is trained with a batch size of 512 for 4, 250 epochs on CIFAR-10 and 1, 000 epochs on Image Net-32.