Input Perturbation Reduces Exposure Bias in Diffusion Models

Authors: Mang Ning, Enver Sangineto, Angelo Porrello, Simone Calderara, Rita Cucchiara

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically show that, without affecting the recall and precision, the proposed input perturbation leads to a significant improvement in the sample quality while reducing both the training and the inference times. For instance, on Celeb A 64 64, we achieve a new state-of-theart FID score of 1.27, while saving 37.5% of the training time.
Researcher Affiliation Academia 1Department of Information and Computing Science, Utrecht University, the Netherlands. 2Department of Engineering (DIEF), University of Modena and Reggio Emilia, Italy.
Pseudocode Yes Algorithm 1 DDPM Standard Training; Algorithm 2 DDPM Standard Sampling; Algorithm 3 DDPM-IP: Training with input perturbation
Open Source Code Yes The code is available at https: //github.com/forever208/DDPM-IP.
Open Datasets Yes We compare ADM-IP with ADM using CIFAR10, Image Net 32 32, LSUN tower 64 64, Celeb A 64 64 (Liu et al., 2015) and FFHQ 128 128.
Dataset Splits No The paper mentions using the full training set or subsets of it to compute reference statistics for FID, but it does not specify explicit train/validation/test splits for model training or hyperparameter tuning.
Hardware Specification Yes We use Pytorch 1.8 (Paszke et al., 2019) and trained all the models on different NVIDIA Tesla V100s (16G memory).
Software Dependencies Yes We use Pytorch 1.8 (Paszke et al., 2019) and trained all the models on different NVIDIA Tesla V100s (16G memory).
Experiment Setup Yes We refer to Appendix A.7 for the complete list of hyperparameters (e.g. the learning rate, the batch size, etc.) and network architecture settings, which are the same for both ADM and ADM-IP. Table 9. ADM and ADM-IP hyperparameter values. Table 10. DDIM and DDIM-IP hyperparameter values on CIFAR10 dataset. Furthermore, we use 16-bit precision and loss-scaling (Micikevicius et al., 2017) for mixed precision training, but keeping 32-bit weights, EMA, and the optimizer state. We use an EMA rate of 0.9999 for all the experiments.