PeRFlow: Piecewise Rectified Flow as Universal Plug-and-Play Accelerator

Authors: Hanshu Yan, Xingchao Liu, Jiachun Pan, Jun Hao Liew, Qiang Liu, Jiashi Feng

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conducted extensive experiments to verify the effectiveness of Pe RFlow on accelerating pretrained diffusion models, including Stable Diffusion (SD) 1.5, SD 2.1, SDXL [32], and Animate Diff [6]. Pe RFlow shows advantages in terms of FID values, visual quality, and generation diversity.
Researcher Affiliation Collaboration Hanshu Yan*, Xingchao Liu+, Jiachun Pan#, Jun Hao Liew*, Qiang Liu+, Jiashi Feng* *Byte Dance, +Univeristy of Texas at Austin, #National University of Singapore
Pseudocode Yes Algorithm 1: Piecewise Rectified Flow
Open Source Code Yes Codes for training and inference have been publicly released. 1https://github.com/magic-research/piecewise-rectified-flow
Open Datasets Yes Images are all sampled from the LAION-Aesthetics-5+ dataset [37] and center-cropped. We compute the FID values of Pe RFlow-accelerated SDs in table 1 using images on three different reference distributions: (1) LAION-5B-Aesthetics [37], which is the training set of Pe RFlow and other methods; (2) MS COCO 2014 [17] validation dataset; (3) images generated from SDv1.5/XL with Journey DB [41] prompts.
Dataset Splits No The paper mentions using "MS COCO 2014 [17] validation dataset" as a reference distribution for FID calculation, but it does not specify a training/validation/test split for its main training datasets with percentages or sample counts.
Hardware Specification Yes All experiments are conducted with 16 NVIDIA A100 GPUs.
Software Dependencies No The paper mentions "Hugging Face scripts for training Stable Diffusion 2" but does not provide specific version numbers for software dependencies (e.g., Python, PyTorch, Diffusers library).
Experiment Setup Yes Pe RFlow-SD-v1.5 is trained with images in resolution of 512 512 using ϵ-prediction defined in (7). We randomly drop out the text captions with a low probability (10%) to enable classifier-free guidance during sampling. We divide the time range [0, 1] into four windows uniformly. For each window, we use the DDIM solver to solve the endpoints with 8 steps.