AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising
Authors: Zigeng Chen, Xinyin Ma, Gongfan Fang, Zhenxiong Tan, Xinchao Wang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validated the broad applicability of Async Diff through extensive testing on several diffusion models. For text-to-image tasks, we experimented with three versions of Stable Diffusion: SD 1.5, SD 2.1 [43], and Stable Diffusion XL (SDXL) [41]. Additionally, we explored the effectiveness of Async Diff on video diffusion models using Stable Video Diffusion (SVD) [2] and Animate Diff [9]. |
| Researcher Affiliation | Academia | Zigeng Chen, Xinyin Ma, Gongfan Fang, Zhenxiong Tan, Xinchao Wang National University of Singapore zigeng99@n.nus.edu, xinchao@nus.edu.sg |
| Pseudocode | No | The paper contains diagrams illustrating the asynchronous denoising process but no formal pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/czg1225/Async Diff |
| Open Datasets | Yes | We assess the zero-shot generation capability using the MS-COCO 2017 [29] validation set, which comprises 5,000 images and captions. |
| Dataset Splits | Yes | We assess the zero-shot generation capability using the MS-COCO 2017 [29] validation set, which comprises 5,000 images and captions. |
| Hardware Specification | Yes | All latency measurements were conducted on NVIDIA A5000 GPUs equipped with NVLINK Bridge. We tested inference speeds on the professional-grade NVIDIA RTX A5000, as well as the consumer-grade NVIDIA RTX 2080 Ti and NVIDIA RTX 3090 GPUs. |
| Software Dependencies | No | The paper mentions using 'torch.distributed' and 'NVIDIA Collective Communication Library (NCCL) backend' but does not specify version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | All models were evaluated using 50 DDIM steps. In this context, N represents the number of segments into which the denoising model is divided, and S denotes the stride of denoising for each parallel computation batch. We also explore the effect of warm-up steps. |