Reducing Spatial Fitting Error in Distillation of Denoising Diffusion Models

Authors: Shengzhe Zhou, Zejian Li, Shengyuan Zhang, Lefan Hou, Changyuan Yang, Guang Yang, Zhiyuan Yang, Lingyun Sun

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, our proposed model facilitates high-quality sample generation in a few function evaluations. We achieve an FID of 5.31 on CIFAR-10 and 9.39 on Image Net 64 64 with only one step, outperforming existing diffusion methods.
Researcher Affiliation Collaboration 1School of Software Technology, Zhejiang University 2 College of Computer Science and Technology, Zhejiang University 3 Alibaba Group
Pseudocode No The paper includes figures illustrating processes but no formal pseudocode or algorithm blocks.
Open Source Code Yes Project link: https://github.com/Sainzerjj/SFERD.
Open Datasets Yes Empirically, SFERD efficiently reduces the fitting error of the student model, leading to superior performance as compared to other distillation models on CIFAR-10 (Krizhevsky 2009) and Image Net 64 64 (Deng et al. 2009).
Dataset Splits No The paper mentions using CIFAR-10 and Image Net 64 64, which are standard datasets, and refers to 'sampling steps' and 'total steps', but it does not explicitly provide specific percentages or counts for training, validation, and test splits, nor does it explicitly mention a 'validation set' split or its usage.
Hardware Specification No The paper does not provide specific hardware details such as GPU models (e.g., NVIDIA A100), CPU models, or cloud computing instance types used for experiments.
Software Dependencies No The paper does not provide specific version numbers for any software dependencies, such as deep learning frameworks or libraries.
Experiment Setup Yes We conduct ablation experiments on the design of critical hyperparameters in the training of SFERD. All ablations are performed on the conditional Image Net 64 64 using SFERDPD (no semantic gradient predictor in the student) with 4 sampling steps unless otherwise stated. Attention threshold. In order to determine the attention threshold, we compute the scales of 0.8, 0.9, 1.0, 1.1 and 1.2. The best metrics are obtained when the ψ is 1.0. ... Attention guidance strength. We evaluate the effect of attention guidance strength, calculating the scales from 0 to 0.5. The best FID was achieved when w = 0.3. ... Gaussian blur strength. We evaluate the effect of Gaussian blur strength σ on performance. We test the strength values of 1, 3, 5, and obtain the best FID at σ = 3.