reproducibilityindex.ai

MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation

Authors: Omer Bar-Tal, Lior Yariv, Yaron Lipman, Tali Dekel

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We thoroughly evaluate our method when applied to each task as discussed in Sec. 4. In all experiments, we used Stable Diffusion (Rombach et al., 2022), where the diffusion process is defined over a latent space I = R64 64 4, and a decoder is trained to reconstruct natural images in higher resolution [0, 1]512 512 3.
Researcher Affiliation	Collaboration	1Weizmann Institute of Science 2Meta AI.
Pseudocode	Yes	Algorithm 1 Multi Diffusion sampling.
Open Source Code	No	Project page is available at https://multidiffusion.github.io. This is a project page, not an explicit statement of code release or a direct link to a code repository.
Open Datasets	Yes	To quantitatively evaluate our performance, we use the COCO dataset (Lin et al., 2014), which contains images with global text caption and instance masks for each object in the image.
Dataset Splits	No	We apply our method on a subset from the validation set, obtained by filtering examples which consists of 2 to 4 foreground objects, excluding people, and masks that occupy less than 5% of the image. The paper uses a ‘validation set’ from COCO for evaluation, but as their method doesn’t require training, it doesn’t specify train/validation splits for its own model reproducibility.
Hardware Specification	No	The paper does not provide specific details on the hardware used for running experiments, such as CPU or GPU models.
Software Dependencies	No	In all experiments, we used Stable Diffusion (Rombach et al., 2022). The paper mentions using ‘Stable Diffusion’ but does not provide specific version numbers for software dependencies or libraries.
Experiment Setup	Yes	We set Tinit to be 20% of the generation process (i.e., Tinit = 800). In all experiments, we used Stable Diffusion (Rombach et al., 2022), where the diffusion process is defined over a latent space I = R64 64 4, and a decoder is trained to reconstruct natural images in higher resolution [0, 1]512 512 3. Similarly, the Multi Diffusion process, Ψ is defined in the latent space J = RH W 4 and using the decoder we produce the results in the target image space [0, 1]8H 8W 3.