DiP-GO: A Diffusion Pruner via Few-step Gradient Optimization

Authors: Haowei Zhu, Dehua Tang, Ji Liu, Mingjie Lu, Jintu Zheng, Jinzhang Peng, Dong Li, Yu Wang, Fan Jiang, Lu Tian, Spandan Tiwari, Ashish Sirasao, Jun-Hai Yong, Bin Wang, Emad Barsoum

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments on various diffusion models including Stable Diffusion series and Di Ts. Our Di P-GO approach achieves 4.4 speedup for SD-1.5 without any loss of accuracy, significantly outperforming the previous state-of-the-art methods.
Researcher Affiliation Collaboration 1Advanced Micro Devices, Inc. 2Tsinghua University
Pseudocode Yes Here, we show the details of our proposed post-process algorithm via pseudo code as followings. Algorithm 1 Diffusion Pruner
Open Source Code No Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [No] Justification: The code will be released after it successfully passes our company s internal review.
Open Datasets Yes We utilize a subset of the Diffusion DB [53] dataset comprising 1000 samples to train our pruner network, utilizing only textual prompts. Following previous works [29, 22], we evaluate the Di P-GO on three public datasets, i.e., Parti Prompts [54], MS-COCO 2017 [55] and Image Net [56].
Dataset Splits Yes We utilize a subset of the Diffusion DB [53] dataset comprising 1000 samples to train our pruner network, utilizing only textual prompts. Following previous works [29, 22], we evaluate the Di P-GO on three public datasets, i.e., Parti Prompts [54], MS-COCO 2017 [55] and Image Net [56]. Comparison of computational complexity, inference speed, CLIP Score and FID on the MS-COCO 2017 validation set on SD-2.1.
Hardware Specification Yes To evaluate the inference efficiency, we evaluate the Multiply Accumulate Calculation (MACs), Parameters (Params), and Speedup for all models with batch size of 1 in the Py Torch 2.1 environment on the AMD MI250 platform.
Software Dependencies Yes To evaluate the inference efficiency, we evaluate the Multiply Accumulate Calculation (MACs), Parameters (Params), and Speedup for all models with batch size of 1 in the Py Torch 2.1 environment on the AMD MI250 platform.
Experiment Setup Yes For Stable Diffusion models, we utilize the SGD optimizer with a cosine learning schedule for 1000 steps of training. The batch size, learning rate, and weight decay are set to 1, 0.1, and 1e-4, respectively. The hyperparameters αs, τ, and the query embedding dimension D, along with the encoder layer number L, are set to 1, 0.2, 512, and 1, respectively. For the Diffusion Transformer model, we use the same experimental configuration as for the stable diffusion model, except that the learning rate set to 1e-3.