Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Information-Theoretic Discrete Diffusion

Authors: Moongyu Jeon, Sangwoo Shin, Dongjae Jeon, Albert No

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate our framework through experiments on both synthetic and real-world datasets. First, using synthetic datasets with known ground-truth distributions, we show that our estimators accurately recover both unconditional and conditional log-likelihoods. Next, we verify the variance reduction properties of our time-free likelihood estimator and the coupled likelihood ratio estimator against their respective baselines. Finally, we demonstrate the practical utility of our approach through auditing experiments on real-world data, where conditional likelihood estimates detect out-of-distribution inputs and reveal distributional shifts in LLa DA (Nie et al., 2025). These results confirm that our information-theoretic framework not only offers theoretical insight but also enables accurate and interpretable likelihood estimation in discrete generative models.
Researcher Affiliation	Academia	1Department of Artificial Intelligence, Yonsei University 2Department of Computer Science, Yonsei University Correspondence to: Albert No <EMAIL>.
Pseudocode	No	No explicit pseudocode or algorithm blocks are present in the paper. The methods are described mathematically and in prose.
Open Source Code	Yes	The code is publicly available at https://github.com/Dongjae0324/infodis.
Open Datasets	Yes	We validate our framework through experiments on both synthetic and real-world datasets. First, using synthetic datasets with known ground-truth distributions, we show that our estimators accurately recover both unconditional and conditional log-likelihoods. Next, we verify the variance reduction properties of our time-free likelihood estimator and the coupled likelihood ratio estimator against their respective baselines. Finally, we demonstrate the practical utility of our approach through auditing experiments on real-world data, where conditional likelihood estimates detect out-of-distribution inputs and reveal distributional shifts in LLa DA (Nie et al., 2025). These results confirm that our information-theoretic framework not only offers theoretical insight but also enables accurate and interpretable likelihood estimation in discrete generative models.
Dataset Splits	Yes	For the unconditional NLL estimation task, we generate 128 unique DNA sequences of length 8 using the alphabet {A, T, G, C}. Each sequence is assigned a probability using a softmax over uniformly sampled scores from [0, 1), scaled by a temperature of 0.5. These probabilities define a categorical distribution over the 128 sequences, from which one million training samples are drawn. For the conditional NLL estimation task, we generate sequences using a 4-th order Markov model over the same DNA alphabet. For each 4-base context, the conditional distribution over the next base is defined by the same softmax procedure applied to independently sampled scores. This results in a valid probabilistic transition table that governs sequence generation. The model is trained on a continuous DNA sequence of total length five million. For NLL evaluation, each subsequence of length 32 is split into a 16-base prompt xprompt and a 16-base response xresponse.
Hardware Specification	Yes	All experiments were conducted on a computing node equipped with 8 NVIDIA L40S GPUs, 1 TB of system memory, and 192 CPU cores. We used single GPU.
Software Dependencies	No	The paper mentions using the Adam W optimizer and RADD model but does not specify version numbers for any software, libraries, or programming languages used in the implementation.
Experiment Setup	Yes	In the unconditional setting, the model is trained for 70,000 steps with a learning rate of 3 10 4 and a batch size of 512. In the conditional setting, training is performed for 80,000 steps with a learning rate of 6 10 4 and a batch size of 1,024.