Q-DM: An Efficient Low-bit Quantized Diffusion Model
Authors: Yanjing Li, Sheng Xu, Xianbin Cao, Xiao Sun, Baochang Zhang
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our methods on popular DDPM and DDIM models. Extensive experimental results show that our method achieves a much better performance than the prior arts. For example, the 4-bit Q-DM theoretically accelerates the 1000-step DDPM by 7.8 and achieves a FID score of 5.17, on the unconditional CIFAR-10 dataset. Extensive experiments on the CIFAR-10 and Image Net datasets show that our Q-DM outperforms the baseline and 8-bit PTQ method by a large margin, and achieves comparable performances as the full-precision counterparts with a considerable acceleration rate. |
| Researcher Affiliation | Collaboration | 1Beihang University 2Shanghai Artificial Intelligence Laboratory 3Zhongguancun Laboratory 4 Nanchang Institute of Technology |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement or link indicating the availability of open-source code for the described methodology. |
| Open Datasets | Yes | We evaluate our method on two datasets including 32 32 generating size in CIFAR-10 [13] and 64 64 generating size in Image Net [14]. |
| Dataset Splits | No | The paper mentions training and testing but does not explicitly provide details about specific validation dataset splits or methodology for data partitioning for validation. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models used for running the experiments. |
| Software Dependencies | No | The paper mentions using DDPM and DDIM models but does not provide specific version numbers for software dependencies or libraries (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | All the training settings are the same as DDPM [10]. For DDIM sampler, we set η in DDIM [32] as 0.5 for the best performance. We set the training timestep T = 1000 for all experiments, following [10]. We set the forward process variances to constants increasing linearly from β1 = 1e 4 to βT = 0.02. To represent the reverse process, we use a U-Net backbone, following [10, 32]. Parameters are shared across time, which is specified to the network using the Transformer sinusoidal position embedding [36]. We use self-attention at the 16 16 feature map resolution [36, 37]. |