EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Diffusion Models

Authors: Yefei He, Jing Liu, Weijia Wu, Hong Zhou, Bohan Zhuang

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experimental results demonstrate that our method significantly outperforms previous PTQ-based diffusion models while maintaining similar time and data efficiency. Specifically, there is only a marginal 0.05 s FID increase when quantizing both weights and activations of LDM-4 to 4-bit on Image Net 256 256. Compared to QAT-based methods, our Efficient DM also boasts a 16.2 faster quantization speed with comparable generation quality, rendering it a compelling choice for practical applications.
Researcher Affiliation Academia Yefei He1 Jing Liu2 Weijia Wu1 Hong Zhou1 Bohan Zhuang2 1Zhejiang University, China 2ZIP Lab, Monash University, Australia
Pseudocode No The paper describes the proposed methods in text and with diagrams (Figure 2), but does not include any explicit pseudocode or algorithm blocks.
Open Source Code Yes Code is available at https://github.com/Thisis Billhe/Efficient DM.
Open Datasets Yes For experiments with DDIM, we evaluate it on CIFAR-10 dataset (Krizhevsky & Hinton, 2009). Experiments with LDM are conducted on two standard benchmarks: Image Net (Deng et al., 2009) and LSUN (Yu et al., 2015).
Dataset Splits No The paper mentions evaluating on CIFAR-10, ImageNet, and LSUN datasets and describes the fine-tuning process, but it does not explicitly provide training/validation/test dataset splits with percentages or sample counts.
Hardware Specification Yes In this section, we evaluated the latency of matrix multiplication and convolution operations in both quantized and full-precision diffusion models, utilizing an RTX3090 GPU and the CUTLASS (Kerr et al., 2017) implementation, as demonstrated in Table C.
Software Dependencies No The paper mentions 'Nvidia s CUTLASS (Kerr et al., 2017) implementation' and 'ADM’s Tensor Flow evaluation suite (Dhariwal & Nichol, 2021)' but does not specify version numbers for these or other software dependencies.
Experiment Setup Yes For the proposed Efficient DM framework, we fine-tune Lo RA weights and quantization parameters for 16K iterations with a batchsize of 4 on LDM models and 64 on DDIM models, respectively. The number of denoising steps for the fine-tuning is set to 100. We employ Adam (Kingma & Ba, 2014) optimizers with a learning rate of 5e 4.