Self-Refining Diffusion Samplers: Enabling Parallelization via Parareal Iterations
Authors: Nikil Selvam, Amil Merchant, Stefano Ermon
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | As we demonstrate for pre-trained diffusion models, the early convergence of this refinement procedure drastically reduces the number of steps required to produce a sample, speeding up generation for instance by up to 1.7x on a 25-step Stable Diffusion-v2 benchmark and up to 4.3x on longer trajectories. |
| Researcher Affiliation | Academia | Nikil Roashan Selvam Amil Merchant Stefano Ermon Department of Computer Science Stanford University {nrs,amil,ermon}@cs.stanford.edu |
| Pseudocode | Yes | Algorithm 1 SRDS: Self-Refining Diffusion Sampler |
| Open Source Code | Yes | Code for our paper can be found at https://github.com/nikilrselvam/srds. |
| Open Datasets | Yes | In particular, we test our SRDS algorithm and demonstrate capabilities in performing diffusion directly on the pixel space of 128x128 LSUN Church and Bedroom [41], 64x64 Imagenet [5], and 32x32 CIFAR [16] using pretrained diffusion models [29], which all use N = 1024 length diffusion trajectories. |
| Dataset Splits | No | The paper does not explicitly provide specific training/validation/test dataset splits (e.g., percentages or exact counts) but rather mentions using established datasets like LSUN Church, Imagenet, CIFAR, and COCO2017 captions. |
| Hardware Specification | Yes | Table 2: CLIP scores of SRDS on Stable Diffusion-v2 over 1000 samples from the COCO2017 captions dataset, with classifier guidance w = 7.5, evaluated on Vi T-g-14. Time is measured on 4 A100 GPUs without pipeline parallelism, showcasing speedups with early convergence of the SRDS sample. Table 4: Comparison of wallclock speedups offered by Pipelined SRDS and Para Di GMS with various thresholds, with respect to Serial image generation. These Stable Diffusion experiments are performed on identical machines (4 40GB A100 GPUs) for a fair comparison. |
| Software Dependencies | No | The paper mentions `torch.multiprocessing` but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | We measure the convergence via l1 norm in pixel space with values [0, 255]. We conservatively set τ = 0.1, meaning that convergence occurs when on average each pixel in the generation differs by only 0.1 after a refinement step (see Appendix F for an ablation on choice of τ). Through our experiments, we quantitatively showcase how the SRDS algorithm can provide signficant speedups in generation without degrading model quality (as measured by FID score [9] on 5000 samples). |