The Hidden Language of Diffusion Models
Authors: Hila Chefer, Oran Lang, Mor Geva, Volodymyr Polosukhin, Assaf Shocher, michal Irani, Inbar Mosseri, Lior Wolf
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through a large battery of experiments, we demonstrate CONCEPTOR s ability to provide meaningful, robust, and faithful decompositions for a wide variety of abstract, concrete, and complex textual concepts, while allowing to naturally connect each decomposition element to its corresponding visual impact on the generated images. We conduct an ablation study to examine the impact of each component on our method. |
| Researcher Affiliation | Collaboration | Hila Chefer 1,2 Oran Lang1 Mor Geva3 Volodymyr Polosukhin1 Assaf Shocher1 Michal Irani1,4 Inbar Mosseri1 Lior Wolf2 1Google Research 2Tel-Aviv University 3Google Deep Mind 4Weizmann Institute |
| Pseudocode | No | The paper includes diagrams and descriptions of the method, such as Figure 6: "Illustration of the CONCEPTOR method", but no formal pseudocode or algorithm blocks with structured steps. |
| Open Source Code | Yes | Please find our code attached as a ZIP file to reproduce our results. |
| Open Datasets | Yes | Data We construct a diverse and comprehensive dataset of 188 concepts, comprised of the basic classes from CIFAR-10 (Krizhevsky, 2009), a list of 28 professions from the Bias in Bios dataset (De Arteaga et al., 2019), 10 basic emotions and 10 basic actions, all 30 prompts from the website Best 30 Stable Diffusion Prompts for Great Images*, which contains complex prompts that require hierarchical reasoning (e.g., Medieval village life , impression of Japanese serenity ), and, finally, we consider 100 random concepts from the Concept Net (Speer & Havasi, 2013) knowledge graph to allow for large-scale evaluation of the methods. |
| Dataset Splits | Yes | We begin by collecting a training set T of 100 concept images. These images provide the statistics for the concept features we wish to learn. We use a test set of 100 seeds to generate images with wc and with each method. We conduct validation every 50 optimization steps on 20 images with a validation seed and select the iteration with the best CLIP pairwise similarity between the reconstruction and the concept images. |
| Hardware Specification | Yes | All of our experiments were conducted using a single A100 GPU with 40GB of memory. |
| Software Dependencies | No | The paper mentions 'Stable Diffusion v2.1' and 'Open CLIP Vi T-H model' but does not specify software dependencies with version numbers like Python, PyTorch, or CUDA versions. |
| Experiment Setup | Yes | We train our MLP as specified in Sec. 3 of the main paper with 100 images generated from the concept using seed 1024 for a maximum of 500 training steps with a batch size of 6 (which is the largest batch size that could fit on our GPU). Additionally, we use a learning rate of 1e 3 (grid searched on 5 concepts between 1e 2, 1e 3, 1e 4). |