Improving the Training of Rectified Flows

Authors: Sangyun Lee, Zinan Lin, Giulia Fanti

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our evaluation shows that on several datasets (CIFAR-10 [Krizhevsky et al., 2009], Image Net 64 64 [Deng et al., 2009]), our improved rectified flow outperforms the state-of-the-art distillation methods such as consistency distillation (CD) [Song et al., 2023] and progressive distillation (PD) [Salimans and Ho, 2022] in both one-step and two-step settings, and it rivals the performance of the improved consistency training (i CT) [Song et al., 2023] in terms of the Frechet Inception Distance [Heusel et al., 2017] (FID). Our training techniques reduce the FID of the previous 2-rectified flow [Liu et al., 2022] by about 75% (12.21 3.07) on CIFAR-10. Ablations on three datasets show that the proposed techniques give a consistent and sizeable gain.
Researcher Affiliation Collaboration Sangyun Lee Carnegie Mellon University sangyunl@andrew.cmu.edu Zinan Lin Microsoft Research zinanlin@microsoft.com Giulia Fanti Carnegie Mellon University gfanti@andrew.cmu.edu
Pseudocode Yes Pseudocode for Reflow is provided in Algorithm. 1. Algorithm 2 shows the pseudocode for generating samples using the new update rule.
Open Source Code Yes Code is available at https://github.com/sangyun884/rfpp.
Open Datasets Yes Our evaluation shows that on several datasets (CIFAR-10 [Krizhevsky et al., 2009], Image Net 64 64 [Deng et al., 2009]), our improved rectified flow outperforms the state-of-the-art distillation methods such as consistency distillation (CD) [Song et al., 2023] and progressive distillation (PD) [Salimans and Ho, 2022] in both one-step and two-step settings, and it rivals the performance of the improved consistency training (i CT) [Song et al., 2023] in terms of the Frechet Inception Distance [Heusel et al., 2017] (FID). License The following are licenses for each dataset we use: CIFAR-10: Unknown FFHQ: CC BY-NC-SA 4.0 AFHQ: CC BY-NC 4.0 Image Net: Custom (research, non-commercial)
Dataset Splits No The paper mentions evaluating models and training details but does not provide explicit training/validation/test dataset splits (e.g., percentages or counts for each split). It refers to the 'entire training set' for FID calculation but does not specify a validation set split. 'For a two-step generation, we evaluate vθ at t = 0.99999 and t = 0.8.' refers to time steps, not data splits.
Hardware Specification Yes This work used Bridges-2 GPU at the Pittsburgh Supercomputing Center through allocation CIS240037 from the Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support (ACCESS) program, which is supported by National Science Foundation grants 2138259, 2138286, 2138307, 2137603, and 2138296 Boerner et al. [2023]. On Image Net, the training takes roughly 9 days with 64 NVIDIA V100 GPUs. On CIFAR-10 and FFHQ/AFHQ, it takes roughly 4 days with 16 and 8 V100 GPUs, respectively. For all cases, we use the NVIDIA DGX-2 cluster.
Software Dependencies No The paper mentions 'mixed-precision training [Micikevicius et al., 2017]' and implies the use of Adam optimizer, but it does not specify version numbers for key software components or libraries such as Python, PyTorch, or CUDA.
Experiment Setup Yes Table 7: Training configurations for each dataset. We linearly ramp up learning rates for all datasets. Datasets Batch size Dropout Learning rate Warm up iter. CIFAR-10 512 0.13 2e-4 5000 FFHQ / AFHQ 256 0.25 2e-4 5000 Image Net 2048 0.10 1e-4 2500 We use Adam optimizer. We use the exponential moving average (EMA) with 0.9999 decay rate for all datasets.