SinDDM: A Single Image Denoising Diffusion Model
Authors: Vladimir Kulikov, Shahar Yadin, Matan Kleiner, Tomer Michaeli
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Table 1 reports a quantitative comparison to other single image generative models on all 12 images appearing in this paper (see App. G.1 for more comparisons). Each measure in the table is computed over 50 samples per training image (we report mean and standard deviation). As can be seen, the diversity of our generated samples (both pixel standard-deviation and average LPIPS distance between pairs of samples) is higher than the competing methods. |
| Researcher Affiliation | Academia | 1Faculty of Electrical and Computer Engineering, Technion Israel Institute of Technology, Haifa, Israel. Correspondence to: Vladimir Kulikov <vladimir.k@campus.technion.ac.il>. |
| Pseudocode | Yes | Algorithm 1 Sin DDM Training and Algorithm 2 Sin DDM Sampling |
| Open Source Code | Yes | Results, code and the Supplementary Material are available on the project s webpage. |
| Open Datasets | No | We trained Sin DDM on images of different styles, including urban and nature scenery as well as art paintings. The paper does not provide concrete access information or citations for these specific training images/datasets. |
| Dataset Splits | No | The paper trains on a single image and evaluates generation quality but does not describe any specific training/validation/test dataset splits. |
| Hardware Specification | Yes | The model has a total of 1.1 106 parameters and its training on a 250 200 image takes around 7 hours on an A6000 GPU. |
| Software Dependencies | No | The paper mentions using the Adam optimizer and a pre-trained CLIP model but does not specify version numbers for any software dependencies. |
| Experiment Setup | Yes | We train the model for 120,000 steps using the Adam optimizer with its default parameters (see App. C for further details). Our model comprises 4 convolutional blocks, with a total receptive field of 35 35. The number of scales is chosen such that the area covered by the receptive field is as close as possible to 40% of the area of the entire image at scale 0. |