Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Improving the Training of Rectified Flows
Authors: Sangyun Lee, Zinan Lin, Giulia Fanti
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our evaluation shows that on several datasets (CIFAR-10 [Krizhevsky et al., 2009], Image Net 64 64 [Deng et al., 2009]), our improved rectified flow outperforms the state-of-the-art distillation methods such as consistency distillation (CD) [Song et al., 2023] and progressive distillation (PD) [Salimans and Ho, 2022] in both one-step and two-step settings, and it rivals the performance of the improved consistency training (i CT) [Song et al., 2023] in terms of the Frechet Inception Distance [Heusel et al., 2017] (FID). Our training techniques reduce the FID of the previous 2-rectified flow [Liu et al., 2022] by about 75% (12.21 3.07) on CIFAR-10. Ablations on three datasets show that the proposed techniques give a consistent and sizeable gain. |
| Researcher Affiliation | Collaboration | Sangyun Lee Carnegie Mellon University EMAIL Zinan Lin Microsoft Research EMAIL Giulia Fanti Carnegie Mellon University EMAIL |
| Pseudocode | Yes | Pseudocode for Reflow is provided in Algorithm. 1. Algorithm 2 shows the pseudocode for generating samples using the new update rule. |
| Open Source Code | Yes | Code is available at https://github.com/sangyun884/rfpp. |
| Open Datasets | Yes | Our evaluation shows that on several datasets (CIFAR-10 [Krizhevsky et al., 2009], Image Net 64 64 [Deng et al., 2009]), our improved rectified flow outperforms the state-of-the-art distillation methods such as consistency distillation (CD) [Song et al., 2023] and progressive distillation (PD) [Salimans and Ho, 2022] in both one-step and two-step settings, and it rivals the performance of the improved consistency training (i CT) [Song et al., 2023] in terms of the Frechet Inception Distance [Heusel et al., 2017] (FID). License The following are licenses for each dataset we use: CIFAR-10: Unknown FFHQ: CC BY-NC-SA 4.0 AFHQ: CC BY-NC 4.0 Image Net: Custom (research, non-commercial) |
| Dataset Splits | No | The paper mentions evaluating models and training details but does not provide explicit training/validation/test dataset splits (e.g., percentages or counts for each split). It refers to the 'entire training set' for FID calculation but does not specify a validation set split. 'For a two-step generation, we evaluate vĪø at t = 0.99999 and t = 0.8.' refers to time steps, not data splits. |
| Hardware Specification | Yes | This work used Bridges-2 GPU at the Pittsburgh Supercomputing Center through allocation CIS240037 from the Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support (ACCESS) program, which is supported by National Science Foundation grants 2138259, 2138286, 2138307, 2137603, and 2138296 Boerner et al. [2023]. On Image Net, the training takes roughly 9 days with 64 NVIDIA V100 GPUs. On CIFAR-10 and FFHQ/AFHQ, it takes roughly 4 days with 16 and 8 V100 GPUs, respectively. For all cases, we use the NVIDIA DGX-2 cluster. |
| Software Dependencies | No | The paper mentions 'mixed-precision training [Micikevicius et al., 2017]' and implies the use of Adam optimizer, but it does not specify version numbers for key software components or libraries such as Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | Table 7: Training configurations for each dataset. We linearly ramp up learning rates for all datasets. Datasets Batch size Dropout Learning rate Warm up iter. CIFAR-10 512 0.13 2e-4 5000 FFHQ / AFHQ 256 0.25 2e-4 5000 Image Net 2048 0.10 1e-4 2500 We use Adam optimizer. We use the exponential moving average (EMA) with 0.9999 decay rate for all datasets. |