SinDDM: A Single Image Denoising Diffusion Model

Authors: Vladimir Kulikov, Shahar Yadin, Matan Kleiner, Tomer Michaeli

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Table 1 reports a quantitative comparison to other single image generative models on all 12 images appearing in this paper (see App. G.1 for more comparisons). Each measure in the table is computed over 50 samples per training image (we report mean and standard deviation). As can be seen, the diversity of our generated samples (both pixel standard-deviation and average LPIPS distance between pairs of samples) is higher than the competing methods.
Researcher Affiliation Academia 1Faculty of Electrical and Computer Engineering, Technion Israel Institute of Technology, Haifa, Israel. Correspondence to: Vladimir Kulikov <vladimir.k@campus.technion.ac.il>.
Pseudocode Yes Algorithm 1 Sin DDM Training and Algorithm 2 Sin DDM Sampling
Open Source Code Yes Results, code and the Supplementary Material are available on the project s webpage.
Open Datasets No We trained Sin DDM on images of different styles, including urban and nature scenery as well as art paintings. The paper does not provide concrete access information or citations for these specific training images/datasets.
Dataset Splits No The paper trains on a single image and evaluates generation quality but does not describe any specific training/validation/test dataset splits.
Hardware Specification Yes The model has a total of 1.1 106 parameters and its training on a 250 200 image takes around 7 hours on an A6000 GPU.
Software Dependencies No The paper mentions using the Adam optimizer and a pre-trained CLIP model but does not specify version numbers for any software dependencies.
Experiment Setup Yes We train the model for 120,000 steps using the Adam optimizer with its default parameters (see App. C for further details). Our model comprises 4 convolutional blocks, with a total receptive field of 35 35. The number of scales is chosen such that the area covered by the receptive field is as close as possible to 40% of the area of the entire image at scale 0.