Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Theoretical Benefit and Limitation of Diffusion Language Model
Authors: Guhao Feng, Yihan Geng, Jian Guan, Wei Wu, Liwei Wang, Di He
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To fully validate our theoretical findings, we conduct synthetic experiments and examine MDMs trained on formal languages, including n-gram languages and Hidden Markov Models (HMMs), systematically analyzing the relationship between performance and efficiency under both TER and SER metrics. All empirical results align with our theoretical predictions |
| Researcher Affiliation | Collaboration | 1 State Key Laboratory of General Artificial Intelligence, Peking University 2 School of Mathematical Sciences, Peking University 3 Ant Group 4 Center for Machine Learning Research, Peking University |
| Pseudocode | Yes | Algorithm 1 Generate n-gram Language Model; Algorithm 2 Generate Hidden Markov Model |
| Open Source Code | No | We will open the code base and data when the paper is published. |
| Open Datasets | No | We evaluated MDMs on several formal languages: n-gram languages (with n {2, 3, 4}) and HMMs. For each language type, parameters (e.g., transition matrices, observation matrices, initial distributions) were randomly sampled. A detailed description of this generation process and examples of resulting sequences are available in Appendix F.1. These formal languages were used to generate datasets of 1,000,000 samples each, with 990,000 for training and 10,000 for validation. |
| Dataset Splits | Yes | These formal languages were used to generate datasets of 1,000,000 samples each, with 990,000 for training and 10,000 for validation. Datasets were generated with sequence lengths L {512, 1024, 2048}. |
| Hardware Specification | Yes | efficiency is defined by the inverse of the execution time measured on 8 Nvidia RTX 4090 GPUs with Huggingface s transformers library; In our experiments of formal languages, all training was conducted on NVIDIA A100 GPUs. |
| Software Dependencies | No | The paper only mentions software names without version numbers, specifically 'Huggingface s transformers library' without a version. |
| Experiment Setup | Yes | Detailed architectural specifications, including layer counts, hidden dimensions, and positional encoding schemes, are provided in Table 7 (Appendix F.2). The training procedure largely followed the framework of Sahoo et al. (2024), with specific training configurations detailed in Table 8. Models were trained for 20 epochs, with convergence monitored on the validation set using perplexity. |