Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Towards Understanding Text Hallucination of Diffusion Models via Local Generation Bias

Authors: Rui Lu, Runzhe Wang, Kaifeng Lyu, Xitai Jiang, Gao Huang, Mengdi Wang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through experimental probing, we consistently observe that such phenomenon is attributed it to the network s local generation bias. ... In this section, we introduce the experimental setup and results of our study on text hallucination in diffusion models. ... With LDR as a probing tool, we discover the following important observations. ... Experimental Result for UNet learning parity parenthesis L = 16 (left) and L = 8 (right). ... Experimental Result for learning Quarter-MNIST using UNet (left) and Di T (right). ... A.3 EXPERIMENTAL DETAILS
Researcher Affiliation	Academia	1Department of Automation, Tsinghua University 2Electrical and Computer Engineering, Princeton University 3Simons Institute, UC Berkeley 4Qiuzhen College, Tsinghua University EMAIL EMAIL EMAIL, EMAIL
Pseudocode	No	No explicit pseudocode or algorithm blocks are provided in the paper. The methodology is described through mathematical formulations and textual explanations.
Open Source Code	No	We also conduct the LDR analysis on real-world models such as FLUX1 (Labs, 2024) and Stable Diffusion 3.5 (Podell et al., 2023). ... Black Forest Labs. Flux. https://github.com/black-forest-labs/flux, 2024. [This refers to a third-party model used for analysis, not code for the authors' own methodology.]
Open Datasets	Yes	Quarter MNIST, each sample image consists of four MNIST digits in the corners and the sum of first row equals the second. ... We combine four MNIST digits image to become a whole figure.
Dataset Splits	Yes	Quarter-MNIST: ... We randomly leave out 200 combinations as test set and render the images of the rest. ... Parity Parenthesis. ... For L = 8 we use half fraction of the valid parity images and 5% for L = 16.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies	No	The paper mentions using 'Adam optimizer' and specific model architectures like 'UNet' and 'Di T' but does not specify any software libraries or their version numbers.
Experiment Setup	Yes	We train with Adam optimizer with lr = 8e-5, batch size bs = 16, total schedule ranging from 160k to 700k iterations.