DiP-GO: A Diffusion Pruner via Few-step Gradient Optimization
Authors: Haowei Zhu, Dehua Tang, Ji Liu, Mingjie Lu, Jintu Zheng, Jinzhang Peng, Dong Li, Yu Wang, Fan Jiang, Lu Tian, Spandan Tiwari, Ashish Sirasao, Jun-Hai Yong, Bin Wang, Emad Barsoum
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments on various diffusion models including Stable Diffusion series and Di Ts. Our Di P-GO approach achieves 4.4 speedup for SD-1.5 without any loss of accuracy, significantly outperforming the previous state-of-the-art methods. |
| Researcher Affiliation | Collaboration | 1Advanced Micro Devices, Inc. 2Tsinghua University |
| Pseudocode | Yes | Here, we show the details of our proposed post-process algorithm via pseudo code as followings. Algorithm 1 Diffusion Pruner |
| Open Source Code | No | Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [No] Justification: The code will be released after it successfully passes our company s internal review. |
| Open Datasets | Yes | We utilize a subset of the Diffusion DB [53] dataset comprising 1000 samples to train our pruner network, utilizing only textual prompts. Following previous works [29, 22], we evaluate the Di P-GO on three public datasets, i.e., Parti Prompts [54], MS-COCO 2017 [55] and Image Net [56]. |
| Dataset Splits | Yes | We utilize a subset of the Diffusion DB [53] dataset comprising 1000 samples to train our pruner network, utilizing only textual prompts. Following previous works [29, 22], we evaluate the Di P-GO on three public datasets, i.e., Parti Prompts [54], MS-COCO 2017 [55] and Image Net [56]. Comparison of computational complexity, inference speed, CLIP Score and FID on the MS-COCO 2017 validation set on SD-2.1. |
| Hardware Specification | Yes | To evaluate the inference efficiency, we evaluate the Multiply Accumulate Calculation (MACs), Parameters (Params), and Speedup for all models with batch size of 1 in the Py Torch 2.1 environment on the AMD MI250 platform. |
| Software Dependencies | Yes | To evaluate the inference efficiency, we evaluate the Multiply Accumulate Calculation (MACs), Parameters (Params), and Speedup for all models with batch size of 1 in the Py Torch 2.1 environment on the AMD MI250 platform. |
| Experiment Setup | Yes | For Stable Diffusion models, we utilize the SGD optimizer with a cosine learning schedule for 1000 steps of training. The batch size, learning rate, and weight decay are set to 1, 0.1, and 1e-4, respectively. The hyperparameters αs, τ, and the query embedding dimension D, along with the encoder layer number L, are set to 1, 0.2, 512, and 1, respectively. For the Diffusion Transformer model, we use the same experimental configuration as for the stable diffusion model, except that the learning rate set to 1e-3. |