Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

SnapFusion: Text-to-Image Diffusion Model on Mobile Devices within Two Seconds

Authors: Yanyu Li, Huan Wang, Qing Jin, Ju Hu, Pavlo Chemerys, Yun Fu, Yanzhi Wang, Sergey Tulyakov, Jian Ren

NeurIPS 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our extensive experiments on MS-COCO show that our model with 8 denoising steps achieves better FID and CLIP scores than Stable Diffusion v1.5 with 50 steps.
Researcher Affiliation Collaboration 1Snap Inc. 2Northeastern University
Pseudocode Yes Algorithm 1 Optimizing UNet Architecture
Open Source Code Yes Project Page: https://snap-research.github.io/Snap Fusion
Open Datasets Yes Our extensive experiments on MS-COCO show that our model with 8 denoising steps achieves better FID and CLIP scores than Stable Diffusion v1.5 with 50 steps.
Dataset Splits Yes We use a small subset (2K images) of MS-COCO validation set [50], fixed steps (50), and CFG scale as 7.5 to benchmark the score, and it takes about 2.5 A100 GPU hours to test each action.
Hardware Specification Yes Table 1: Latency Comparison between Stable Diffusion v1.5 and our proposed efficient diffusion models (UNet and Image Decoder) on i Phone 14 Pro.
Software Dependencies No The paper mentions 'diffusers library' and 'Tensor RT [64] library' but does not specify their version numbers or other software dependencies with specific versions required for reproducibility.
Experiment Setup Yes We use Adam W optimizer [52], set weight decay as 0.01, and apply training batch size as 2, 048.