Easing Concept Bleeding in Diffusion via Entity Localization and Anchoring
Authors: Jiewei Zhang, Song Guo, Peiran Dong, Jie Zhang, Ziming Liu, Yue Yu, Xiao-Ming Wu
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate its superior capability in precisely generating multiple objects as specified in the textual prompts. Experimental results illustrate that our approach excels in accurately generating multiple objects. In this section, we will conduct a thorough qualitative and quantitative comparison of our method with existing approaches. |
| Researcher Affiliation | Academia | 1The Hong Kong Polytechnic University. 2Peng Cheng Laboratory. 3The Hong Kong University of Science and Technology.. |
| Pseudocode | Yes | Algorithm 1 Entity Localization and Anchoring |
| Open Source Code | No | The paper does not provide an explicit statement about the release of source code or a link to a code repository for the described methodology. |
| Open Datasets | No | The paper uses generated images based on specified prompt formats (e.g., 'a [entity A] and a [entity B]') for evaluation. While the paper refers to sets of entities (e.g., '20 animals and objects'), it does not mention or provide access information for a publicly available, formal training dataset. |
| Dataset Splits | No | The paper focuses on evaluating generated images from prompts rather than training on a specific dataset with explicit train/validation/test splits. Therefore, it does not specify dataset split information. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, memory amounts) used for running its experiments. |
| Software Dependencies | Yes | Our algorithm is employed within the pre-trained stable diffusion V-1.4. |
| Experiment Setup | Yes | Specifically, we concentrate on cross-attention maps associated with entities mentioned in the prompt. These maps are primarily extracted in the upsampling block with a resolution of 16 16. ... We configure the start and end timesteps (Tstart, Tend) to establish meaningful constraints on entities. ... where λ serves as the weighting factor. |