EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Diffusion Models
Authors: Yefei He, Jing Liu, Weijia Wu, Hong Zhou, Bohan Zhuang
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experimental results demonstrate that our method significantly outperforms previous PTQ-based diffusion models while maintaining similar time and data efficiency. Specifically, there is only a marginal 0.05 s FID increase when quantizing both weights and activations of LDM-4 to 4-bit on Image Net 256 256. Compared to QAT-based methods, our Efficient DM also boasts a 16.2 faster quantization speed with comparable generation quality, rendering it a compelling choice for practical applications. |
| Researcher Affiliation | Academia | Yefei He1 Jing Liu2 Weijia Wu1 Hong Zhou1 Bohan Zhuang2 1Zhejiang University, China 2ZIP Lab, Monash University, Australia |
| Pseudocode | No | The paper describes the proposed methods in text and with diagrams (Figure 2), but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/Thisis Billhe/Efficient DM. |
| Open Datasets | Yes | For experiments with DDIM, we evaluate it on CIFAR-10 dataset (Krizhevsky & Hinton, 2009). Experiments with LDM are conducted on two standard benchmarks: Image Net (Deng et al., 2009) and LSUN (Yu et al., 2015). |
| Dataset Splits | No | The paper mentions evaluating on CIFAR-10, ImageNet, and LSUN datasets and describes the fine-tuning process, but it does not explicitly provide training/validation/test dataset splits with percentages or sample counts. |
| Hardware Specification | Yes | In this section, we evaluated the latency of matrix multiplication and convolution operations in both quantized and full-precision diffusion models, utilizing an RTX3090 GPU and the CUTLASS (Kerr et al., 2017) implementation, as demonstrated in Table C. |
| Software Dependencies | No | The paper mentions 'Nvidia s CUTLASS (Kerr et al., 2017) implementation' and 'ADM’s Tensor Flow evaluation suite (Dhariwal & Nichol, 2021)' but does not specify version numbers for these or other software dependencies. |
| Experiment Setup | Yes | For the proposed Efficient DM framework, we fine-tune Lo RA weights and quantization parameters for 16K iterations with a batchsize of 4 on LDM models and 64 on DDIM models, respectively. The number of denoising steps for the fine-tuning is set to 100. We employ Adam (Kingma & Ba, 2014) optimizers with a learning rate of 5e 4. |