gDDIM: Generalized denoising diffusion implicit models

Authors: Qinsheng Zhang, Molei Tao, Yongxin Chen

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate g DDIM in two non-isotropic DMs: Blurring diffusion model (BDM) and Critically-damped Langevin diffusion model (CLD). We observe more than 20 times acceleration in BDM. In the CLD, a diffusion model by augmenting the diffusion process with velocity, our algorithm achieves an FID score of 2.26, on CIFAR10, with only 50 number of score function evaluations (NFEs) and an FID score of 2.86 with only 27 NFEs. Project page and code: https://github.com/qshzh/g DDIM. We conduct experiments with different DMs and sampling algorithms on CIFAR10 for quantitative comparison.
Researcher Affiliation Academia Qinsheng Zhang Georgia Institute of Technology qzhang419@gatech.edu Molei Tao Georgia Institute of Technology mtao@gatech.edu Yongxin Chen Georgia Institute of Technology yongchen@gatech.edu
Pseudocode Yes Algorithm 1 Exponential multistep Predictor-Corrector
Open Source Code Yes Project page and code: https://github.com/qshzh/g DDIM.
Open Datasets Yes We conduct experiments with different DMs and sampling algorithms on CIFAR10 for quantitative comparison.
Dataset Splits No The paper extensively discusses evaluation metrics (FID) and comparisons on CIFAR10 but does not explicitly specify the training, validation, or test dataset splits with percentages or counts. It uses CIFAR10 as a benchmark, which has standard splits, but these are not explicitly stated in the paper's text.
Hardware Specification Yes GPUs 4 A6000
Software Dependencies No The paper mentions 'We implemented g DDIM and related algorithms in Jax.' and lists other code sources in Table 9, but it does not specify version numbers for Jax or any other software libraries required for reproducibility.
Experiment Setup Yes Table 4: Model architectures and hyperparameters. Hyperparameter CIFAR10 CELEBA Model EMA rate 0.9999 0.999 # of Res Block per resolution 8 2 Normalization Group Normalization Group Normalization Progressive input Residual None Progressive combine Sum N/A Finite Impluse Response Enabled Disabled Embedding type Fourier Positional # of parameters 108M 62M Training # of iterations 1m 150k Optimizer Adam Adam Learning rate 2 10-4 2 10-4 Gradient norm clipping 1.0 1.0 Dropout 0.1 0.1 Batch size per GPU 32 32 GPUs 4 A6000 4 A6000 Training time 79h 16h