reproducibilityindex.ai

Scaling Rectified Flow Transformers for High-Resolution Image Synthesis

Authors: Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Robin Rombach

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through a large-scale study, we demonstrate the superior performance of this approach compared to established diffusion formulations for high-resolution text-to-image synthesis. [...] We train models on Image Net (Russakovsky et al., 2014) and CC12M (Changpinyo et al., 2021), and evaluate both the training and the EMA weights of the models during training using validation losses, CLIP scores (Radford et al., 2021; Hessel et al., 2021), and FID (Heusel et al., 2017) under different sampler settings.
Researcher Affiliation	Industry	Patrick Esser * Sumith Kulal Andreas Blattmann Rahim Entezari Jonas M uller Harry Saini Yam Levi Dominik Lorenz Axel Sauer Frederic Boesel Dustin Podell Tim Dockhorn Zion English Robin Rombach * Stability AI
Pseudocode	Yes	Algorithm 1 Finding Duplicate Items in a Cluster [...] Algorithm 2 Detecting Memorization in Generated Images
Open Source Code	Yes	The core contributions of our work are: [...] We make results, code, and model weights publicly available.
Open Datasets	Yes	We train models on Image Net (Russakovsky et al., 2014) and CC12M (Changpinyo et al., 2021)...
Dataset Splits	Yes	We train models on Image Net (Russakovsky et al., 2014) and CC12M (Changpinyo et al., 2021), and evaluate both the training and the EMA weights of the models during training using validation losses, CLIP scores (Radford et al., 2021; Hessel et al., 2021), and FID (Heusel et al., 2017) under different sampler settings. [...] All metrics are evaluated on the COCO-2014 validation split (Lin et al., 2014).
Hardware Specification	No	The paper mentions "GPU" and "bf16-mixed precision" (which implies certain hardware capabilities) but does not specify exact GPU models (e.g., NVIDIA A100) or CPU details used for experiments.
Software Dependencies	No	The paper mentions "Adam W optimizer (Loshchilov & Hutter, 2017)" and "autofaiss (2023)" but does not provide specific version numbers for these software components or other libraries used for the experiments.
Experiment Setup	Yes	In this experiment, we train all models using a global batch size of 1024 using the Adam W optimizer (Loshchilov & Hutter, 2017) with a learning rate of 10-4 and 1000 linear warmup steps. We use mixed-precision training and keep a copy of the model weights which gets updated every 100 training batches with an exponential moving average (EMA) using a decay factor of 0.99.