On Mechanistic Knowledge Localization in Text-to-Image Generative Models
Authors: Samyadeep Basu, Keivan Rezaei, Priyatham Kattakinda, Vlad I Morariu, Nanxuan Zhao, Ryan A. Rossi, Varun Manjunatha, Soheil Feizi
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we empirically observe the effectiveness of causal tracing to models beyond Stable-Diffusion-v15. ... In this section, we provide empirical results highlighting the localized layers across various open-source text-to-image generative models: ... Human-Study Results. We run a human-study to verify that LOCOGEN can effectively identify controlling layers for different visual attributes. ... In Fig 57 we provide a comprehensive comparison and analysis of how LOCOEDIT compares to other methods. |
| Researcher Affiliation | Collaboration | 1University of Maryland 2Adobe Research. |
| Pseudocode | Yes | Algorithm 1 provides the pseudocode to find the best candidate. |
| Open Source Code | Yes | Code will be available at https://github.com/samyadeepbasu/LocoGen. |
| Open Datasets | Yes | We use the benchmark dataset from (Basu et al., 2023) and (Kumari et al., 2023) for obtaining prompts for objects , style and facts . ... In particular, we curate a set of 320 prompts from MS-COCO with 80 objects and 4 locations ( beach , forest , city , house ) for each. |
| Dataset Splits | No | The paper discusses the use of prompts for generating and evaluating images but does not specify training, validation, or test dataset splits with explicit percentages or counts. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU models, CPU types, memory amounts) used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., library names, framework versions) used for replicating the experiments. |
| Experiment Setup | Yes | We set the following hyper-parameters for λK and λV in LOCOEDIT as 0.01 for all the text-to-image models, as it led to the best editing results. ... To select the cardinality of the set C , we run an iterative hyper-parameter search with m [1, M]... |