Directed Diffusion: Direct Control of Object Placement through Attention Guidance
Authors: Wan-Duo Kurt Ma, Avisek Lahiri, J. P. Lewis, Thomas Leung, W. Bastiaan Kleijn
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Table 1 gives quantitative CLIP similarities between the embeddings of the prompts and the synthesized images. Our results are similar to or better than the compared methods. As described earlier, unmodified T2I methods such as stable diffusion are fallible and using such a system is not simply a matter of typing a text prompt. |
| Researcher Affiliation | Collaboration | Wan-Duo Kurt Ma1, Avisek Lahiri2, J.P. Lewis3*, Thomas Leung2, W. Bastiaan Kleijn1,2 1Victoria University of Wellington 2Google Research 3NVIDIA Research |
| Pseudocode | Yes | Algorithm 1: DD Cross-Attention Editing Algorithm 1: procedure DDCROSSATTNEDIT(DM( ), P, B) 2: for l layer(DM(zt, P)) do 3: if type(l) Cross Attn then 4: A = Softmax(Ql(zt) Kl(P)T ) 5: D(i) A(i) W(B )+S(B) i T I 6: a := arg mina La 7: A (|P|+1:77) := D(|P|+1:77) a 8: zt l(zt, A Vl(P)) 9: else 10: zt l(zt) |
| Open Source Code | No | The algorithm is simple to implement, requiring only a few lines of modification of a widely used library.4 4https://github.com/huggingface/diffusers |
| Open Datasets | No | The paper evaluates its method by generating images from text prompts and comparing them with other models using metrics like CLIP similarity, but it does not specify or provide access to a publicly available dataset of prompts or images used for its experimental evaluation. |
| Dataset Splits | No | The paper does not specify any training, validation, or test dataset splits. Their method operates on pre-trained models and generates images from prompts. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory details) used to run its experiments. It only mentions hardware used by other research for context. |
| Software Dependencies | No | Our method is implemented using the Python DIFFUSERS3 implementation of stable diffusion (SD) and uses the available pre-trained SD 1.5 model.3https://github.com/huggingface/diffusers |
| Experiment Setup | Yes | We chose N=10 in most of our experiments. ... wr R is a given weight, generally set to 0.1 in our experiments. ... N is generally set at 10 in our experiments (Ma et al. 2023). Large N better preserves the original foreground and background, while smaller N encourages more interaction between the foreground and background. |