SEGA: Instructing Text-to-Image Models using Semantic Guidance
Authors: Manuel Brack, Felix Friedrich, Dominik Hintersdorf, Lukas Struppek, Patrick Schramowski, Kristian Kersting
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 Experimental Evaluation |
| Researcher Affiliation | Collaboration | Manuel Brack1,2 Felix Friedrich2,3 Dominik Hintersdorf2 Lukas Struppek2 Patrick Schramowski1,2,3,4 Kristian Kersting1,2,3,5 1German Research Center for Artificial Intelligence (DFKI), 2Computer Science Department, TU Darmstadt 3Hessian.AI, 4LAION, 5Centre for Cognitive Science, TU Darmstadt |
| Pseudocode | Yes | Additionally, we also provide the pseudo-code notation of SEGA in Alg 1. |
| Open Source Code | Yes | Implementation available in diffusers: https://huggingface.co/docs/diffusers/api/pipelines/semantic_stable_diffusion |
| Open Datasets | Yes | This setting is inspired by the Celeb A dataset [17] and marks a well-established benchmark for semantic changes in image generation. and Utilizing the facial images from the previous experiment, we calculated FID scores against a reference dataset of FFHQ [11]. |
| Dataset Splits | No | The paper utilizes pre-trained models (Stable Diffusion, Paella, Deep Floyd-IF) and conducts user studies on generated images, thus it does not specify training/test/validation dataset splits for its own experimental process. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory) used to run its experiments. |
| Software Dependencies | No | The paper mentions building on the 'diffusers library' and using 'Stable Diffusion v1.5' but does not provide specific version numbers for software dependencies or programming languages. |
| Experiment Setup | Yes | Let us provide some more detailed intuition for each of SEGA s hyper-parameters Scale se. Threshold λ. Warmup δ. Momentum. and All images are generated from the same original image (shown in Fig. 10) obtained by the prompt a house at a lake. |