Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models

Authors: Zemin Huang, Zhiyang Chen, Zijun Wang, Tiancheng Li, Guo-Jun Qi

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on both math and code generation tasks show that using only public data and 16 H800 GPUs, DCo LT-reinforced DLMs outperform other DLMs trained by SFT or RL or even both. Notably, DCo LT-reinforced LLa DA boosts its reasoning accuracy by +9.8%, +5.7%, +11.4%, +19.5% on GSM8K, MATH, MBPP, and Human Eval.
Researcher Affiliation	Collaboration	1Zhejiang Univeristy 2MAPLE Lab, Westlake University 3Matterwave Intelligence 4Institute of Advanced Technology, Westlake Institute for Advanced Study EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1 A General Framework for Training DCo LT
Open Source Code	Yes	https://github.com/maple-research-lab/LLaDOU
Open Datasets	Yes	Table 11: Reference assets and their licenses. Asset License Utility SEDD [24] MIT Code & Model GSM8K-Aug [10] Data LLa DA [27] MIT Code & Model MATH [16] MIT Data GSM8K [8] MIT Data Kod Code [41] CC BY-NC 4.0 Data
Dataset Splits	Yes	For GSM8K, there are 7.5K questions for training and 1.32K questions for testing. For MATH, there are 7.5K questions for training and 5K questions for testing.
Hardware Specification	Yes	Experiments on both math and code generation tasks show that using only public data and 16 H800 GPUs, DCo LT-reinforced DLMs outperform other DLMs trained by SFT or RL or even both.
Software Dependencies	No	The paper mentions using Adam W optimizer but does not specify version numbers for key software components like deep learning frameworks (e.g., PyTorch, TensorFlow) or programming language versions.
Experiment Setup	Yes	The model is trained with 64 prompts in a batch, each generating 16 completions to form a group for advantage calculation. We take an Adam W optimizer with a learning rate of 5e-6, and (β1, β2) = (0.9, 0.999). We do not apply the KL penalty by default, as it provides marginal benefits in our experiments. The whole training lasts for 140 iterations on 16 H800 GPUs, which takes about 63 GPU days (i.e., about 4 days on wall clock with 16 GPUs).