BiDM: Pushing the Limit of Quantization for Diffusion Models

Authors: Xingyu Zheng, Xianglong Liu, Yichen Bian, Xudong Ma, Yulun Zhang, Jiakai Wang, Jinyang Guo, Haotong Qin

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that compared to existing SOTA fully binarized methods, Bi DM significantly improves accuracy while maintaining the same inference efficiency, surpassing all existing baselines across various evaluation metrics. Specifically, in pixel space diffusion models, Bi DM is the only method that raises the IS to 5.18, close to the level of full-precision models and 0.95 higher than the best baseline method. In LDM, Bi DM reduces the FID on LSUN-Bedrooms from the SOTA method s 59.44 to an impressive 22.74, while fully benefiting from 28.0 storage and 52.7 OPs savings.
Researcher Affiliation Academia 1Beihang University 2Shanghai Jiao Tong University 3Zhongguancun Laboratory 4ETH Zürich
Pseudocode No The paper does not include any pseudocode or algorithm blocks.
Open Source Code Yes Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: The paper provides open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in the supplemental material.
Open Datasets Yes We conduct experiments on various datasets, including CIFAR-10 32 32 [27], LSUN-Bedrooms 256 256 [72], LSUN-Churches 256 256 [72] and FFHQ 256 256 [25] over pixel space diffusion models [19] and latent space diffusion models [50].
Dataset Splits No The paper describes its training process and evaluation against reference batches but does not specify explicit train/validation/test splits (e.g., percentages or counts) for the datasets themselves.
Hardware Specification Yes All our experiments are conducted on a server with NVIDIA A100 40GB GPU.
Software Dependencies No We utilized the general deployment library Larq [8] on a Qualcomm Snapdragon 855 Plus to test the actual runtime efficiency of the aforementioned single convolution. The runtime results for a single inference are summarized in the Table 8. While Larq is mentioned, no specific version number is provided for it or any other software dependencies.
Experiment Setup Yes For the CIFAR-10 dataset, we set the learning rate to 6e-5 and the batch size to 64 during training. The training process consisted of 100k iterations, and during sampling, we used 100 sampling steps. For the LSUN-Bedrooms, LSUN-Churches and FFHQ datasets, the learning rate was set to 2e-5 and the batch size to 4 during training. The training consisted of 200k iterations, with 200 steps used during denoising phase.