AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising

Authors: Zigeng Chen, Xinyin Ma, Gongfan Fang, Zhenxiong Tan, Xinchao Wang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validated the broad applicability of Async Diff through extensive testing on several diffusion models. For text-to-image tasks, we experimented with three versions of Stable Diffusion: SD 1.5, SD 2.1 [43], and Stable Diffusion XL (SDXL) [41]. Additionally, we explored the effectiveness of Async Diff on video diffusion models using Stable Video Diffusion (SVD) [2] and Animate Diff [9].
Researcher Affiliation Academia Zigeng Chen, Xinyin Ma, Gongfan Fang, Zhenxiong Tan, Xinchao Wang National University of Singapore zigeng99@n.nus.edu, xinchao@nus.edu.sg
Pseudocode No The paper contains diagrams illustrating the asynchronous denoising process but no formal pseudocode or algorithm blocks.
Open Source Code Yes Code is available at https://github.com/czg1225/Async Diff
Open Datasets Yes We assess the zero-shot generation capability using the MS-COCO 2017 [29] validation set, which comprises 5,000 images and captions.
Dataset Splits Yes We assess the zero-shot generation capability using the MS-COCO 2017 [29] validation set, which comprises 5,000 images and captions.
Hardware Specification Yes All latency measurements were conducted on NVIDIA A5000 GPUs equipped with NVLINK Bridge. We tested inference speeds on the professional-grade NVIDIA RTX A5000, as well as the consumer-grade NVIDIA RTX 2080 Ti and NVIDIA RTX 3090 GPUs.
Software Dependencies No The paper mentions using 'torch.distributed' and 'NVIDIA Collective Communication Library (NCCL) backend' but does not specify version numbers for these or any other software dependencies.
Experiment Setup Yes All models were evaluated using 50 DDIM steps. In this context, N represents the number of segments into which the denoising model is divided, and S denotes the stride of denoising for each parallel computation batch. We also explore the effect of warm-up steps.