Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis
Authors: Weixi Feng, Xuehai He, Tsu-Jui Fu, Varun Jampani, Arjun Reddy Akula, Pradyumna Narayana, Sugato Basu, Xin Eric Wang, William Yang Wang
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that our method achieves more accurate attribute binding and compositionality in the generated images. We also propose a benchmark named Attribute Binding Contrast set (ABC-6K) to measure the compositional skills of T2I models. We conduct extensive experiments and analysis to identify the causes of incorrect attribute binding, which points out future directions in improving the faithfulness and compositionality of text-to-image synthesis. |
| Researcher Affiliation | Collaboration | 1University of California, Santa Barbara, 2University of California, Santa Cruz, 3Google |
| Pseudocode | Yes | Algorithm 1 Structure Diffusion Guidance. |
| Open Source Code | Yes | We release our core codebase containing the methodology implementation, settings, benchmarks containing compositional prompts under supplementary materials. |
| Open Datasets | No | The paper states it uses the pre-trained Stable Diffusion model and evaluates on specific benchmarks (ABC-6K, CC-500) and 10K randomly sampled captions from MSCOCO. While these are test/evaluation datasets, the paper does not specify training or validation splits for a model trained by the authors, as their method is training-free guidance for an existing model. |
| Dataset Splits | No | The paper states it uses the pre-trained Stable Diffusion model and evaluates on specific benchmarks (ABC-6K, CC-500) and 10K randomly sampled captions from MSCOCO. While these are test/evaluation datasets, the paper does not specify training or validation splits for a model trained by the authors, as their method is training-free guidance for an existing model. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU/CPU models, memory specifications). |
| Software Dependencies | Yes | Throughout the experiments, we implement our method upon Stable Diffusion v1.4. |
| Experiment Setup | Yes | Throughout the experiments, we implement our method upon Stable Diffusion v1.4. ... We fix the guidance scale to 7.5 and equally weight the key-value matrices in cross-attention layers if not otherwise specified. We do not add hand-crafted prompts such as a photo of to the text input. We use the Stanza Library (Qi et al., 2020) for constituency parsing and obtain noun phrases if not otherwise specified. |