Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Theoretical Benefit and Limitation of Diffusion Language Model

Authors: Guhao Feng, Yihan Geng, Jian Guan, Wei Wu, Liwei Wang, Di He

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To fully validate our theoretical findings, we conduct synthetic experiments and examine MDMs trained on formal languages, including n-gram languages and Hidden Markov Models (HMMs), systematically analyzing the relationship between performance and efficiency under both TER and SER metrics. All empirical results align with our theoretical predictions
Researcher Affiliation Collaboration 1 State Key Laboratory of General Artificial Intelligence, Peking University 2 School of Mathematical Sciences, Peking University 3 Ant Group 4 Center for Machine Learning Research, Peking University
Pseudocode Yes Algorithm 1 Generate n-gram Language Model; Algorithm 2 Generate Hidden Markov Model
Open Source Code No We will open the code base and data when the paper is published.
Open Datasets No We evaluated MDMs on several formal languages: n-gram languages (with n {2, 3, 4}) and HMMs. For each language type, parameters (e.g., transition matrices, observation matrices, initial distributions) were randomly sampled. A detailed description of this generation process and examples of resulting sequences are available in Appendix F.1. These formal languages were used to generate datasets of 1,000,000 samples each, with 990,000 for training and 10,000 for validation.
Dataset Splits Yes These formal languages were used to generate datasets of 1,000,000 samples each, with 990,000 for training and 10,000 for validation. Datasets were generated with sequence lengths L {512, 1024, 2048}.
Hardware Specification Yes efficiency is defined by the inverse of the execution time measured on 8 Nvidia RTX 4090 GPUs with Huggingface s transformers library; In our experiments of formal languages, all training was conducted on NVIDIA A100 GPUs.
Software Dependencies No The paper only mentions software names without version numbers, specifically 'Huggingface s transformers library' without a version.
Experiment Setup Yes Detailed architectural specifications, including layer counts, hidden dimensions, and positional encoding schemes, are provided in Table 7 (Appendix F.2). The training procedure largely followed the framework of Sahoo et al. (2024), with specific training configurations detailed in Table 8. Models were trained for 20 epochs, with convergence monitored on the validation set using perplexity.