MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation
Authors: Omer Bar-Tal, Lior Yariv, Yaron Lipman, Tali Dekel
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We thoroughly evaluate our method when applied to each task as discussed in Sec. 4. In all experiments, we used Stable Diffusion (Rombach et al., 2022), where the diffusion process is defined over a latent space I = R64 64 4, and a decoder is trained to reconstruct natural images in higher resolution [0, 1]512 512 3. |
| Researcher Affiliation | Collaboration | 1Weizmann Institute of Science 2Meta AI. |
| Pseudocode | Yes | Algorithm 1 Multi Diffusion sampling. |
| Open Source Code | No | Project page is available at https://multidiffusion.github.io. This is a project page, not an explicit statement of code release or a direct link to a code repository. |
| Open Datasets | Yes | To quantitatively evaluate our performance, we use the COCO dataset (Lin et al., 2014), which contains images with global text caption and instance masks for each object in the image. |
| Dataset Splits | No | We apply our method on a subset from the validation set, obtained by filtering examples which consists of 2 to 4 foreground objects, excluding people, and masks that occupy less than 5% of the image. The paper uses a ‘validation set’ from COCO for evaluation, but as their method doesn’t require training, it doesn’t specify train/validation splits for its own model reproducibility. |
| Hardware Specification | No | The paper does not provide specific details on the hardware used for running experiments, such as CPU or GPU models. |
| Software Dependencies | No | In all experiments, we used Stable Diffusion (Rombach et al., 2022). The paper mentions using ‘Stable Diffusion’ but does not provide specific version numbers for software dependencies or libraries. |
| Experiment Setup | Yes | We set Tinit to be 20% of the generation process (i.e., Tinit = 800). In all experiments, we used Stable Diffusion (Rombach et al., 2022), where the diffusion process is defined over a latent space I = R64 64 4, and a decoder is trained to reconstruct natural images in higher resolution [0, 1]512 512 3. Similarly, the Multi Diffusion process, Ψ is defined in the latent space J = RH W 4 and using the decoder we produce the results in the target image space [0, 1]8H 8W 3. |