reproducibilityindex.ai

SnapFusion: Text-to-Image Diffusion Model on Mobile Devices within Two Seconds

Authors: Yanyu Li, Huan Wang, Qing Jin, Ju Hu, Pavlo Chemerys, Yun Fu, Yanzhi Wang, Sergey Tulyakov, Jian Ren

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our extensive experiments on MS-COCO show that our model with 8 denoising steps achieves better FID and CLIP scores than Stable Diffusion v1.5 with 50 steps.
Researcher Affiliation	Collaboration	1Snap Inc. 2Northeastern University
Pseudocode	Yes	Algorithm 1 Optimizing UNet Architecture
Open Source Code	Yes	Project Page: https://snap-research.github.io/Snap Fusion
Open Datasets	Yes	Our extensive experiments on MS-COCO show that our model with 8 denoising steps achieves better FID and CLIP scores than Stable Diffusion v1.5 with 50 steps.
Dataset Splits	Yes	We use a small subset (2K images) of MS-COCO validation set [50], fixed steps (50), and CFG scale as 7.5 to benchmark the score, and it takes about 2.5 A100 GPU hours to test each action.
Hardware Specification	Yes	Table 1: Latency Comparison between Stable Diffusion v1.5 and our proposed efficient diffusion models (UNet and Image Decoder) on i Phone 14 Pro.
Software Dependencies	No	The paper mentions 'diffusers library' and 'Tensor RT [64] library' but does not specify their version numbers or other software dependencies with specific versions required for reproducibility.
Experiment Setup	Yes	We use Adam W optimizer [52], set weight decay as 0.01, and apply training batch size as 2, 048.