Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Towards Understanding Text Hallucination of Diffusion Models via Local Generation Bias
Authors: Rui Lu, Runzhe Wang, Kaifeng Lyu, Xitai Jiang, Gao Huang, Mengdi Wang
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through experimental probing, we consistently observe that such phenomenon is attributed it to the network s local generation bias. ... In this section, we introduce the experimental setup and results of our study on text hallucination in diffusion models. ... With LDR as a probing tool, we discover the following important observations. ... Experimental Result for UNet learning parity parenthesis L = 16 (left) and L = 8 (right). ... Experimental Result for learning Quarter-MNIST using UNet (left) and Di T (right). ... A.3 EXPERIMENTAL DETAILS |
| Researcher Affiliation | Academia | 1Department of Automation, Tsinghua University 2Electrical and Computer Engineering, Princeton University 3Simons Institute, UC Berkeley 4Qiuzhen College, Tsinghua University EMAIL EMAIL EMAIL, EMAIL |
| Pseudocode | No | No explicit pseudocode or algorithm blocks are provided in the paper. The methodology is described through mathematical formulations and textual explanations. |
| Open Source Code | No | We also conduct the LDR analysis on real-world models such as FLUX1 (Labs, 2024) and Stable Diffusion 3.5 (Podell et al., 2023). ... Black Forest Labs. Flux. https://github.com/black-forest-labs/flux, 2024. [This refers to a third-party model used for analysis, not code for the authors' own methodology.] |
| Open Datasets | Yes | Quarter MNIST, each sample image consists of four MNIST digits in the corners and the sum of first row equals the second. ... We combine four MNIST digits image to become a whole figure. |
| Dataset Splits | Yes | Quarter-MNIST: ... We randomly leave out 200 combinations as test set and render the images of the rest. ... Parity Parenthesis. ... For L = 8 we use half fraction of the valid parity images and 5% for L = 16. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions using 'Adam optimizer' and specific model architectures like 'UNet' and 'Di T' but does not specify any software libraries or their version numbers. |
| Experiment Setup | Yes | We train with Adam optimizer with lr = 8e-5, batch size bs = 16, total schedule ranging from 160k to 700k iterations. |