ReNO: Enhancing One-step Text-to-Image Models through Reward-based Noise Optimization
Authors: Luca Eyring, Shyamgopal Karthik, Karsten Roth, Alexey Dosovitskiy, Zeynep Akata
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Remarkably, solving this optimization problem with gradient ascent for 50 iterations yields impressive results on four different one-step models across two competitive benchmarks, T2I-Comp Bench and Gen Eval. |
| Researcher Affiliation | Collaboration | Luca Eyring1,2,3, Shyamgopal Karthik1,2,3,4 Karsten Roth2,3,4 Alexey Dosovitskiy5 Zeynep Akata1,2,3 1Technical University of Munich 2Munich Center of Machine Learning 3Helmholtz Munich 4University of Tübingen & Tübingen AI Center 5Inceptive |
| Pseudocode | Yes | Algorithm 1 Re NO |
| Open Source Code | Yes | Code is available at https://github.com/Explainable ML/Re NO. |
| Open Datasets | Yes | First, we evaluate on T2I-Comp Bench [37], which comprises 6000 compositional prompts spanning six categories... Second, we employ Gen Eval [28], consisting of 552 object-focused prompts... Finally, we utilize Parti-Prompts [103], a collection of more than 1600 complex prompts... |
| Dataset Splits | No | The paper evaluates on established benchmarks (T2I-Comp Bench, Gen Eval, Parti-Prompts), which imply predefined splits, but it does not explicitly state the train/validation/test dataset splits used for its own experiments (e.g., specific percentages or sample counts). |
| Hardware Specification | Yes | Then, all of the models can be optimized on a single A100 GPU in 20-50 seconds, and e.g., SD-Turbo requires only 15GB VRAM for the entire optimization process. [...] Table 6: Computational cost comparison of Re NO optimizing four reward models on an A100 GPU. |
| Software Dependencies | No | Our code is built with Pytorch [67] and is mainly based on the diffusers library [90]. |
| Experiment Setup | Yes | Throughout all experiments, we optimize Equation (7) for 50 steps using gradient ascent with Nesterov momentum and gradient norm clipping for stability. [...] We use λreg = 0.01 for all our experiments. For the learning rate, we use µ = 5 for all our 512 × 512 models and µ = 10 for Hyper SDXL that generates 1024 × 1024 as we found this to give a good balance between exploration, improvements, and fast convergence. |