Efficient Spatially Sparse Inference for Conditional GANs and Diffusion Models
Authors: Muyang Li, Ji Lin, Chenlin Meng, Stefano Ermon, Song Han, Jun-Yan Zhu
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To evaluate our method, we automatically create new image editing benchmark datasets on LSUN Church [11] and Cityscapes [12]. Without loss of visual fidelity, we reduce the computation of DDIM [5] by 7.5 , Progressive Distillation [13] by 2.7 , and Gau GAN by 18 measured by MACs*. Compared to existing generative model acceleration methods [10, 14, 15, 16, 17, 18, 19], our method directly uses the off-the-shelf pre-trained weights and could be applied to these methods as a plugin. When applied to GAN Compression [10], our method reduces the computation of Gau GAN by 50 . See Figure 1 for some examples of our method. With SIGE, we accelerate DDIM 3.0 on RTX 3090 GPU and 6.6 on Apple M1 Pro CPU, and Gau GAN 4.2 on RTX 3090 GPU and 38 on Apple M1 Pro CPU. |
| Researcher Affiliation | Academia | 1Carnegie Mellon University 2Massachusetts Institute of Technology 3Stanford University |
| Pseudocode | No | The paper describes the algorithms and pipelines in text and diagrams (Figure 5) but does not include a formal pseudocode block or algorithm listing. |
| Open Source Code | Yes | Our code and benchmarks are available at https://github.com/lmxyy/sige. |
| Open Datasets | Yes | To evaluate our method, we automatically create new image editing benchmark datasets on LSUN Church [11] and Cityscapes [12]. |
| Dataset Splits | Yes | The training and validation sets consist of 2975 and 500 images, respectively. Our editing dataset has 1505 editing pairs in total. We evaluate Gau GAN [8] on this dataset. ... On LSUN Church, we only use 431 synthetic images for the PSNR/LPIPS with G.T. metrics, as manual editing does not have ground truths. For the other metrics, we use the entire LSUN Church dataset (431 synthetic + 23 manual). |
| Hardware Specification | Yes | With SIGE, we accelerate DDIM 3.0 on RTX 3090 GPU and 6.6 on Apple M1 Pro CPU, and Gau GAN 4.2 on RTX 3090 GPU and 38 on Apple M1 Pro CPU. |
| Software Dependencies | Yes | The latency is measured in Py Torch 1.7 . ... We benchmark the results with Tensor RT 8.4. |
| Experiment Setup | Yes | Implementation details. The number of denoising steps for DDIM and PD are 100 and 8, respectively, and we use 50 and 5 steps for SDEdit. We dilate the difference mask by 5, 2, 5, and 1 pixels for DDIM, PD with resolution 128, PD with resolution 256 and Gau GAN, respectively. Besides, we apply our sparse kernel to all convolution layers whose input feature map resolutions are larger than 32 32, 16 16, 8 16 and 16 32 for DDIM, PD, original Gau GAN and GAN Compression, respectively. For DDIM [5] and PD [13], we pre-compute and reuse the statistics of the original image for all group normalization layers [84]. For GAN Compression [10], we pre-compute and reuse the statistics of the original image for all instance normalization layers [82] whose resolution is higher than 16 32. For all models, the sparse block size for 3 3 convolution is 6 and 1 1 convolution is 4. |