FlowTurbo: Towards Real-time Flow-Based Image Generation with Velocity Refiner

Authors: Wenliang Zhao, Minglei Shi, Xumin Yu, Jie Zhou, Jiwen Lu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform extensive experiments to evaluate our method. By applying Flow Turbo to different flow-based models, we obtain an acceleration ratio of 53.1% 58.3% on class-conditional generation and 29.8% 38.5% on text-to-image generation.
Researcher Affiliation Academia Wenliang Zhao Department of Automation Tsinghua University wenliangzhao.thu@gmail.com Minglei Shi Department of Automation Tsinghua University stephenserrylei@gmail.com Xumin Yu Department of Automation Tsinghua University yuxumin98@gmail.com Jie Zhou Department of Automation Tsinghua University jzhou@tsinghua.edu.cn Jiwen Lu Department of Automation Tsinghua University lujiwen@tsinghua.edu.cn
Pseudocode Yes Algorithm 1 Heun s Method Sampler and Algorithm 2 Pseudo Corrector Sampler in Appendix B.
Open Source Code Yes Code is available at https://github.com/shiml20/Flow Turbo.
Open Datasets Yes For class-conditional image generation, we adopt a transformer-style flow-based model Si T-XL [24] pre-trained on Image Net 256 256. We use Image Net-1K [6]2 to train our velocity model. We use a subset of LAION [34]3 containing only 50K images to train our velocity model.
Dataset Splits No The paper mentions using 'MS COCO 2017 [16] validation set' for FID calculation but does not explicitly state the train/validation splits used for training its own models or components.
Hardware Specification Yes In both tasks, we use a single NVIDIA A800 GPU to train the velocity refiner and find it converges within 6 hours. We use a batch size of 8 on a single A800 GPU to measure the latency of each method.
Software Dependencies No Our code is implemented in Py Torch 6. (The '6' is a footnote, not a version. No specific PyTorch version or other library versions are mentioned.)
Experiment Setup Yes Following common practice [24, 30], we adopt a classifier-free guidance scale (CFG) of 1.5. During training, we randomly sample t (0, 0.12] and compute the training objectives in (13). We use Adam W [21] optimizer for all models. We use a constant learning rate of 5 10 5 and a batch size of 18 on a single A800 GPU. We use Adam W [21] optimizer with a learning rate of 2e-5 and weight decay of 0.0. We adopt a batch size of 16 and set the warming-up steps as 100. We also use a gradient clipping of 0.01 to stabilize training.