reproducibilityindex.ai

EmerDiff: Emerging Pixel-level Semantic Knowledge in Diffusion Models

Authors: Koichi Namekata, Amirmojtaba Sabour, Sanja Fidler, Seung Wook Kim

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we evaluate our produced segmentation masks both qualitatively and quantitatively. To quantitatively evaluate our segmentation masks, we apply our framework to two downstream tasks: unsupervised semantic segmentation and annotation-free open vocabulary segmentation.
Researcher Affiliation	Collaboration	1University of Toronto, 2Vector Institute, 3NVIDIA
Pseudocode	No	The paper describes its methods in paragraph form and with mathematical equations but does not include any pseudocode or algorithm blocks.
Open Source Code	No	Project page: https://kmcode1.github.io/Projects/EmerDiff/. This link is a project page, not a direct link to a source-code repository.
Open Datasets	Yes	The effectiveness of our framework is extensively evaluated on multiple scene-centric datasets such as COCO-Stuff (Caesar et al., 2018), PASCAL-Context (Mottaghi et al., 2014), ADE20K (Zhou et al., 2019) and Cityscapes (Cordts et al., 2016)
Dataset Splits	No	The paper evaluates its framework on existing datasets (COCO-Stuff, PASCAL-Context, ADE20K, Cityscapes) using their ground truth annotations for evaluation, but does not specify a training/validation/test split for its own method or data.
Hardware Specification	No	The paper does not provide any specific details about the hardware used for running the experiments.
Software Dependencies	No	The paper mentions using 'official Stable Diffusion v1.4 checkpoint', which is a model version, but does not provide details for other software dependencies (e.g., programming language, libraries, or frameworks with specific version numbers) needed for reproducibility.
Experiment Setup	Yes	Throughout the experiments, we use the official Stable Diffusion v1.4 checkpoint with DDPM sampling scheme of 50 steps (for clarity purposes, we denote timesteps out of T = 1000). To generate low-resolution segmentation maps, we extract feature maps at timestep tf = 1 (minimum noise). We apply modulation to the third cross-attention layer of 16 16 upward blocks at timestep tm = 281 and λ = 10. (Also Section D 'HYPERPARAMETER ANALYSIS').