reproducibilityindex.ai

PeRFlow: Piecewise Rectified Flow as Universal Plug-and-Play Accelerator

Authors: Hanshu Yan, Xingchao Liu, Jiachun Pan, Jun Hao Liew, Qiang Liu, Jiashi Feng

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conducted extensive experiments to verify the effectiveness of Pe RFlow on accelerating pretrained diffusion models, including Stable Diffusion (SD) 1.5, SD 2.1, SDXL [32], and Animate Diff [6]. Pe RFlow shows advantages in terms of FID values, visual quality, and generation diversity.
Researcher Affiliation	Collaboration	Hanshu Yan, Xingchao Liu+, Jiachun Pan#, Jun Hao Liew, Qiang Liu+, Jiashi Feng* *Byte Dance, +Univeristy of Texas at Austin, #National University of Singapore
Pseudocode	Yes	Algorithm 1: Piecewise Rectified Flow
Open Source Code	Yes	Codes for training and inference have been publicly released. 1https://github.com/magic-research/piecewise-rectified-flow
Open Datasets	Yes	Images are all sampled from the LAION-Aesthetics-5+ dataset [37] and center-cropped. We compute the FID values of Pe RFlow-accelerated SDs in table 1 using images on three different reference distributions: (1) LAION-5B-Aesthetics [37], which is the training set of Pe RFlow and other methods; (2) MS COCO 2014 [17] validation dataset; (3) images generated from SDv1.5/XL with Journey DB [41] prompts.
Dataset Splits	No	The paper mentions using "MS COCO 2014 [17] validation dataset" as a reference distribution for FID calculation, but it does not specify a training/validation/test split for its main training datasets with percentages or sample counts.
Hardware Specification	Yes	All experiments are conducted with 16 NVIDIA A100 GPUs.
Software Dependencies	No	The paper mentions "Hugging Face scripts for training Stable Diffusion 2" but does not provide specific version numbers for software dependencies (e.g., Python, PyTorch, Diffusers library).
Experiment Setup	Yes	Pe RFlow-SD-v1.5 is trained with images in resolution of 512 512 using ϵ-prediction defined in (7). We randomly drop out the text captions with a low probability (10%) to enable classifier-free guidance during sampling. We divide the time range [0, 1] into four windows uniformly. For each window, we use the DDIM solver to solve the endpoints with 8 steps.