reproducibilityindex.ai

Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis

Authors: Weixi Feng, Xuehai He, Tsu-Jui Fu, Varun Jampani, Arjun Reddy Akula, Pradyumna Narayana, Sugato Basu, Xin Eric Wang, William Yang Wang

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results show that our method achieves more accurate attribute binding and compositionality in the generated images. We also propose a benchmark named Attribute Binding Contrast set (ABC-6K) to measure the compositional skills of T2I models. We conduct extensive experiments and analysis to identify the causes of incorrect attribute binding, which points out future directions in improving the faithfulness and compositionality of text-to-image synthesis.
Researcher Affiliation	Collaboration	1University of California, Santa Barbara, 2University of California, Santa Cruz, 3Google
Pseudocode	Yes	Algorithm 1 Structure Diffusion Guidance.
Open Source Code	Yes	We release our core codebase containing the methodology implementation, settings, benchmarks containing compositional prompts under supplementary materials.
Open Datasets	No	The paper states it uses the pre-trained Stable Diffusion model and evaluates on specific benchmarks (ABC-6K, CC-500) and 10K randomly sampled captions from MSCOCO. While these are test/evaluation datasets, the paper does not specify training or validation splits for a model trained by the authors, as their method is training-free guidance for an existing model.
Dataset Splits	No	The paper states it uses the pre-trained Stable Diffusion model and evaluates on specific benchmarks (ABC-6K, CC-500) and 10K randomly sampled captions from MSCOCO. While these are test/evaluation datasets, the paper does not specify training or validation splits for a model trained by the authors, as their method is training-free guidance for an existing model.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU/CPU models, memory specifications).
Software Dependencies	Yes	Throughout the experiments, we implement our method upon Stable Diffusion v1.4.
Experiment Setup	Yes	Throughout the experiments, we implement our method upon Stable Diffusion v1.4. ... We fix the guidance scale to 7.5 and equally weight the key-value matrices in cross-attention layers if not otherwise specified. We do not add hand-crafted prompts such as a photo of to the text input. We use the Stanza Library (Qi et al., 2020) for constituency parsing and obtain noun phrases if not otherwise specified.