FlowTurbo: Towards Real-time Flow-Based Image Generation with Velocity Refiner
Authors: Wenliang Zhao, Minglei Shi, Xumin Yu, Jie Zhou, Jiwen Lu
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform extensive experiments to evaluate our method. By applying Flow Turbo to different flow-based models, we obtain an acceleration ratio of 53.1% 58.3% on class-conditional generation and 29.8% 38.5% on text-to-image generation. |
| Researcher Affiliation | Academia | Wenliang Zhao Department of Automation Tsinghua University wenliangzhao.thu@gmail.com Minglei Shi Department of Automation Tsinghua University stephenserrylei@gmail.com Xumin Yu Department of Automation Tsinghua University yuxumin98@gmail.com Jie Zhou Department of Automation Tsinghua University jzhou@tsinghua.edu.cn Jiwen Lu Department of Automation Tsinghua University lujiwen@tsinghua.edu.cn |
| Pseudocode | Yes | Algorithm 1 Heun s Method Sampler and Algorithm 2 Pseudo Corrector Sampler in Appendix B. |
| Open Source Code | Yes | Code is available at https://github.com/shiml20/Flow Turbo. |
| Open Datasets | Yes | For class-conditional image generation, we adopt a transformer-style flow-based model Si T-XL [24] pre-trained on Image Net 256 256. We use Image Net-1K [6]2 to train our velocity model. We use a subset of LAION [34]3 containing only 50K images to train our velocity model. |
| Dataset Splits | No | The paper mentions using 'MS COCO 2017 [16] validation set' for FID calculation but does not explicitly state the train/validation splits used for training its own models or components. |
| Hardware Specification | Yes | In both tasks, we use a single NVIDIA A800 GPU to train the velocity refiner and find it converges within 6 hours. We use a batch size of 8 on a single A800 GPU to measure the latency of each method. |
| Software Dependencies | No | Our code is implemented in Py Torch 6. (The '6' is a footnote, not a version. No specific PyTorch version or other library versions are mentioned.) |
| Experiment Setup | Yes | Following common practice [24, 30], we adopt a classifier-free guidance scale (CFG) of 1.5. During training, we randomly sample t (0, 0.12] and compute the training objectives in (13). We use Adam W [21] optimizer for all models. We use a constant learning rate of 5 10 5 and a batch size of 18 on a single A800 GPU. We use Adam W [21] optimizer with a learning rate of 2e-5 and weight decay of 0.0. We adopt a batch size of 16 and set the warming-up steps as 100. We also use a gradient clipping of 0.01 to stabilize training. |