DiffEdit: Diffusion-based semantic image editing with mask guidance

Authors: Guillaume Couairon, Jakob Verbeek, Holger Schwenk, Matthieu Cord

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We quantitatively evaluate our approach and compare to prior work using images of the Image Net and COCO dataset, as well as a set of generated images. [...] In this section, we describe our experimental setup, followed by qualitative and quantitative results.
Researcher Affiliation Collaboration Guillaume Couairon, Jakob Verbeek, Holger Schwenk Meta AI {gcouairon,jjverbeek, schwenk}@meta.com Matthieu Cord Sorbonne Universit e, Valeo.ai matthieu.cord@ sorbonne-universite.fr
Pseudocode No The paper describes the DIFFEDIT framework in Section 3.2 with textual steps and a high-level illustration in Figure 2, but it does not include formal pseudocode blocks or algorithms.
Open Source Code No The paper provides links to a third-party pre-trained model (Stable Diffusion) and an unofficial implementation of a comparison method (Cross Attention Control), but not the open-source code for the proposed DIFFEDIT methodology itself.
Open Datasets Yes Datasets. We perform experiments on three datasets. First, on Image Net (Deng et al., 2009)... Second, we consider editing images generated by Imagen (Saharia et al., 2022b)... Third, we consider edits based on images and queries from the COCO (Lin et al., 2014) dataset...
Dataset Splits Yes We create a filtered version of this dataset, for which queries are structurally similar to the caption, i.e. where only a few words are changed, but the grammatical structure stays the same. We use the filtering criterion that the total number of words inserted/deleted/replaced must not exceed 25% of the total number of words in the original caption, resulting in a total of 272 queries out of 50k original queries.
Hardware Specification Yes This allows to edit images in 10 seconds on a single Quadro GP100 GPU.
Software Dependencies No The paper mentions using "latent diffusion models," "Stable Diffusion," and "VQGAN latent spaces" and provides a link to a specific Stable Diffusion model. However, it does not provide specific version numbers for core software libraries or programming languages (e.g., Python, PyTorch/TensorFlow versions).
Experiment Setup Yes In our algorithm, we use a Gaussian noise with strength 50%... The result is then rescaled to the range [0, 1], and binarized with a threshold, which we set to 0.5 by default. [...] We use 50 steps in DDIM sampling with a fixed schedule, and the encoding ratio parameter further decreases the number of updates used for our edits. [...] We also use classifier-free guidance (Ho & Salimans, 2022) with the recommended values: 5 on Image Net, 7.5 for Stable Diffusion.