Q-DM: An Efficient Low-bit Quantized Diffusion Model

Authors: Yanjing Li, Sheng Xu, Xianbin Cao, Xiao Sun, Baochang Zhang

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our methods on popular DDPM and DDIM models. Extensive experimental results show that our method achieves a much better performance than the prior arts. For example, the 4-bit Q-DM theoretically accelerates the 1000-step DDPM by 7.8 and achieves a FID score of 5.17, on the unconditional CIFAR-10 dataset. Extensive experiments on the CIFAR-10 and Image Net datasets show that our Q-DM outperforms the baseline and 8-bit PTQ method by a large margin, and achieves comparable performances as the full-precision counterparts with a considerable acceleration rate.
Researcher Affiliation Collaboration 1Beihang University 2Shanghai Artificial Intelligence Laboratory 3Zhongguancun Laboratory 4 Nanchang Institute of Technology
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide an explicit statement or link indicating the availability of open-source code for the described methodology.
Open Datasets Yes We evaluate our method on two datasets including 32 32 generating size in CIFAR-10 [13] and 64 64 generating size in Image Net [14].
Dataset Splits No The paper mentions training and testing but does not explicitly provide details about specific validation dataset splits or methodology for data partitioning for validation.
Hardware Specification No The paper does not provide specific hardware details such as GPU or CPU models used for running the experiments.
Software Dependencies No The paper mentions using DDPM and DDIM models but does not provide specific version numbers for software dependencies or libraries (e.g., Python, PyTorch, CUDA versions).
Experiment Setup Yes All the training settings are the same as DDPM [10]. For DDIM sampler, we set η in DDIM [32] as 0.5 for the best performance. We set the training timestep T = 1000 for all experiments, following [10]. We set the forward process variances to constants increasing linearly from β1 = 1e 4 to βT = 0.02. To represent the reverse process, we use a U-Net backbone, following [10, 32]. Parameters are shared across time, which is specified to the network using the Transformer sinusoidal position embedding [36]. We use self-attention at the 16 16 feature map resolution [36, 37].