On Mechanistic Knowledge Localization in Text-to-Image Generative Models

Authors: Samyadeep Basu, Keivan Rezaei, Priyatham Kattakinda, Vlad I Morariu, Nanxuan Zhao, Ryan A. Rossi, Varun Manjunatha, Soheil Feizi

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we empirically observe the effectiveness of causal tracing to models beyond Stable-Diffusion-v15. ... In this section, we provide empirical results highlighting the localized layers across various open-source text-to-image generative models: ... Human-Study Results. We run a human-study to verify that LOCOGEN can effectively identify controlling layers for different visual attributes. ... In Fig 57 we provide a comprehensive comparison and analysis of how LOCOEDIT compares to other methods.
Researcher Affiliation Collaboration 1University of Maryland 2Adobe Research.
Pseudocode Yes Algorithm 1 provides the pseudocode to find the best candidate.
Open Source Code Yes Code will be available at https://github.com/samyadeepbasu/LocoGen.
Open Datasets Yes We use the benchmark dataset from (Basu et al., 2023) and (Kumari et al., 2023) for obtaining prompts for objects , style and facts . ... In particular, we curate a set of 320 prompts from MS-COCO with 80 objects and 4 locations ( beach , forest , city , house ) for each.
Dataset Splits No The paper discusses the use of prompts for generating and evaluating images but does not specify training, validation, or test dataset splits with explicit percentages or counts.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU models, CPU types, memory amounts) used for running the experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., library names, framework versions) used for replicating the experiments.
Experiment Setup Yes We set the following hyper-parameters for λK and λV in LOCOEDIT as 0.01 for all the text-to-image models, as it led to the best editing results. ... To select the cardinality of the set C , we run an iterative hyper-parameter search with m [1, M]...