reproducibilityindex.ai

Directed Diffusion: Direct Control of Object Placement through Attention Guidance

Authors: Wan-Duo Kurt Ma, Avisek Lahiri, J. P. Lewis, Thomas Leung, W. Bastiaan Kleijn

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Table 1 gives quantitative CLIP similarities between the embeddings of the prompts and the synthesized images. Our results are similar to or better than the compared methods. As described earlier, unmodified T2I methods such as stable diffusion are fallible and using such a system is not simply a matter of typing a text prompt.
Researcher Affiliation	Collaboration	Wan-Duo Kurt Ma1, Avisek Lahiri2, J.P. Lewis3*, Thomas Leung2, W. Bastiaan Kleijn1,2 1Victoria University of Wellington 2Google Research 3NVIDIA Research
Pseudocode	Yes	Algorithm 1: DD Cross-Attention Editing Algorithm 1: procedure DDCROSSATTNEDIT(DM( ), P, B) 2: for l layer(DM(zt, P)) do 3: if type(l) Cross Attn then 4: A = Softmax(Ql(zt) Kl(P)T ) 5: D(i) A(i) W(B )+S(B) i T I 6: a := arg mina La 7: A (\|P\|+1:77) := D(\|P\|+1:77) a 8: zt l(zt, A Vl(P)) 9: else 10: zt l(zt)
Open Source Code	No	The algorithm is simple to implement, requiring only a few lines of modification of a widely used library.4 4https://github.com/huggingface/diffusers
Open Datasets	No	The paper evaluates its method by generating images from text prompts and comparing them with other models using metrics like CLIP similarity, but it does not specify or provide access to a publicly available dataset of prompts or images used for its experimental evaluation.
Dataset Splits	No	The paper does not specify any training, validation, or test dataset splits. Their method operates on pre-trained models and generates images from prompts.
Hardware Specification	No	The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory details) used to run its experiments. It only mentions hardware used by other research for context.
Software Dependencies	No	Our method is implemented using the Python DIFFUSERS3 implementation of stable diffusion (SD) and uses the available pre-trained SD 1.5 model.3https://github.com/huggingface/diffusers
Experiment Setup	Yes	We chose N=10 in most of our experiments. ... wr R is a given weight, generally set to 0.1 in our experiments. ... N is generally set at 10 in our experiments (Ma et al. 2023). Large N better preserves the original foreground and background, while smaller N encourages more interaction between the foreground and background.