Self-Refining Diffusion Samplers: Enabling Parallelization via Parareal Iterations

Authors: Nikil Selvam, Amil Merchant, Stefano Ermon

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental As we demonstrate for pre-trained diffusion models, the early convergence of this refinement procedure drastically reduces the number of steps required to produce a sample, speeding up generation for instance by up to 1.7x on a 25-step Stable Diffusion-v2 benchmark and up to 4.3x on longer trajectories.
Researcher Affiliation Academia Nikil Roashan Selvam Amil Merchant Stefano Ermon Department of Computer Science Stanford University {nrs,amil,ermon}@cs.stanford.edu
Pseudocode Yes Algorithm 1 SRDS: Self-Refining Diffusion Sampler
Open Source Code Yes Code for our paper can be found at https://github.com/nikilrselvam/srds.
Open Datasets Yes In particular, we test our SRDS algorithm and demonstrate capabilities in performing diffusion directly on the pixel space of 128x128 LSUN Church and Bedroom [41], 64x64 Imagenet [5], and 32x32 CIFAR [16] using pretrained diffusion models [29], which all use N = 1024 length diffusion trajectories.
Dataset Splits No The paper does not explicitly provide specific training/validation/test dataset splits (e.g., percentages or exact counts) but rather mentions using established datasets like LSUN Church, Imagenet, CIFAR, and COCO2017 captions.
Hardware Specification Yes Table 2: CLIP scores of SRDS on Stable Diffusion-v2 over 1000 samples from the COCO2017 captions dataset, with classifier guidance w = 7.5, evaluated on Vi T-g-14. Time is measured on 4 A100 GPUs without pipeline parallelism, showcasing speedups with early convergence of the SRDS sample. Table 4: Comparison of wallclock speedups offered by Pipelined SRDS and Para Di GMS with various thresholds, with respect to Serial image generation. These Stable Diffusion experiments are performed on identical machines (4 40GB A100 GPUs) for a fair comparison.
Software Dependencies No The paper mentions `torch.multiprocessing` but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes We measure the convergence via l1 norm in pixel space with values [0, 255]. We conservatively set τ = 0.1, meaning that convergence occurs when on average each pixel in the generation differs by only 0.1 after a refinement step (see Appendix F for an ablation on choice of τ). Through our experiments, we quantitatively showcase how the SRDS algorithm can provide signficant speedups in generation without degrading model quality (as measured by FID score [9] on 5000 samples).