SnapFusion: Text-to-Image Diffusion Model on Mobile Devices within Two Seconds
Authors: Yanyu Li, Huan Wang, Qing Jin, Ju Hu, Pavlo Chemerys, Yun Fu, Yanzhi Wang, Sergey Tulyakov, Jian Ren
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our extensive experiments on MS-COCO show that our model with 8 denoising steps achieves better FID and CLIP scores than Stable Diffusion v1.5 with 50 steps. |
| Researcher Affiliation | Collaboration | 1Snap Inc. 2Northeastern University |
| Pseudocode | Yes | Algorithm 1 Optimizing UNet Architecture |
| Open Source Code | Yes | Project Page: https://snap-research.github.io/Snap Fusion |
| Open Datasets | Yes | Our extensive experiments on MS-COCO show that our model with 8 denoising steps achieves better FID and CLIP scores than Stable Diffusion v1.5 with 50 steps. |
| Dataset Splits | Yes | We use a small subset (2K images) of MS-COCO validation set [50], fixed steps (50), and CFG scale as 7.5 to benchmark the score, and it takes about 2.5 A100 GPU hours to test each action. |
| Hardware Specification | Yes | Table 1: Latency Comparison between Stable Diffusion v1.5 and our proposed efficient diffusion models (UNet and Image Decoder) on i Phone 14 Pro. |
| Software Dependencies | No | The paper mentions 'diffusers library' and 'Tensor RT [64] library' but does not specify their version numbers or other software dependencies with specific versions required for reproducibility. |
| Experiment Setup | Yes | We use Adam W optimizer [52], set weight decay as 0.01, and apply training batch size as 2, 048. |