EmerDiff: Emerging Pixel-level Semantic Knowledge in Diffusion Models
Authors: Koichi Namekata, Amirmojtaba Sabour, Sanja Fidler, Seung Wook Kim
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we evaluate our produced segmentation masks both qualitatively and quantitatively. To quantitatively evaluate our segmentation masks, we apply our framework to two downstream tasks: unsupervised semantic segmentation and annotation-free open vocabulary segmentation. |
| Researcher Affiliation | Collaboration | 1University of Toronto, 2Vector Institute, 3NVIDIA |
| Pseudocode | No | The paper describes its methods in paragraph form and with mathematical equations but does not include any pseudocode or algorithm blocks. |
| Open Source Code | No | Project page: https://kmcode1.github.io/Projects/EmerDiff/. This link is a project page, not a direct link to a source-code repository. |
| Open Datasets | Yes | The effectiveness of our framework is extensively evaluated on multiple scene-centric datasets such as COCO-Stuff (Caesar et al., 2018), PASCAL-Context (Mottaghi et al., 2014), ADE20K (Zhou et al., 2019) and Cityscapes (Cordts et al., 2016) |
| Dataset Splits | No | The paper evaluates its framework on existing datasets (COCO-Stuff, PASCAL-Context, ADE20K, Cityscapes) using their ground truth annotations for evaluation, but does not specify a training/validation/test split for its own method or data. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for running the experiments. |
| Software Dependencies | No | The paper mentions using 'official Stable Diffusion v1.4 checkpoint', which is a model version, but does not provide details for other software dependencies (e.g., programming language, libraries, or frameworks with specific version numbers) needed for reproducibility. |
| Experiment Setup | Yes | Throughout the experiments, we use the official Stable Diffusion v1.4 checkpoint with DDPM sampling scheme of 50 steps (for clarity purposes, we denote timesteps out of T = 1000). To generate low-resolution segmentation maps, we extract feature maps at timestep tf = 1 (minimum noise). We apply modulation to the third cross-attention layer of 16 16 upward blocks at timestep tm = 281 and λ = 10. (Also Section D 'HYPERPARAMETER ANALYSIS'). |