Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Towards Understanding Text Hallucination of Diffusion Models via Local Generation Bias

Authors: Rui Lu, Runzhe Wang, Kaifeng Lyu, Xitai Jiang, Gao Huang, Mengdi Wang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through experimental probing, we consistently observe that such phenomenon is attributed it to the network s local generation bias. ... In this section, we introduce the experimental setup and results of our study on text hallucination in diffusion models. ... With LDR as a probing tool, we discover the following important observations. ... Experimental Result for UNet learning parity parenthesis L = 16 (left) and L = 8 (right). ... Experimental Result for learning Quarter-MNIST using UNet (left) and Di T (right). ... A.3 EXPERIMENTAL DETAILS
Researcher Affiliation Academia 1Department of Automation, Tsinghua University 2Electrical and Computer Engineering, Princeton University 3Simons Institute, UC Berkeley 4Qiuzhen College, Tsinghua University EMAIL EMAIL EMAIL, EMAIL
Pseudocode No No explicit pseudocode or algorithm blocks are provided in the paper. The methodology is described through mathematical formulations and textual explanations.
Open Source Code No We also conduct the LDR analysis on real-world models such as FLUX1 (Labs, 2024) and Stable Diffusion 3.5 (Podell et al., 2023). ... Black Forest Labs. Flux. https://github.com/black-forest-labs/flux, 2024. [This refers to a third-party model used for analysis, not code for the authors' own methodology.]
Open Datasets Yes Quarter MNIST, each sample image consists of four MNIST digits in the corners and the sum of first row equals the second. ... We combine four MNIST digits image to become a whole figure.
Dataset Splits Yes Quarter-MNIST: ... We randomly leave out 200 combinations as test set and render the images of the rest. ... Parity Parenthesis. ... For L = 8 we use half fraction of the valid parity images and 5% for L = 16.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies No The paper mentions using 'Adam optimizer' and specific model architectures like 'UNet' and 'Di T' but does not specify any software libraries or their version numbers.
Experiment Setup Yes We train with Adam optimizer with lr = 8e-5, batch size bs = 16, total schedule ranging from 160k to 700k iterations.