ReNO: Enhancing One-step Text-to-Image Models through Reward-based Noise Optimization

Authors: Luca Eyring, Shyamgopal Karthik, Karsten Roth, Alexey Dosovitskiy, Zeynep Akata

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Remarkably, solving this optimization problem with gradient ascent for 50 iterations yields impressive results on four different one-step models across two competitive benchmarks, T2I-Comp Bench and Gen Eval.
Researcher Affiliation Collaboration Luca Eyring1,2,3, Shyamgopal Karthik1,2,3,4 Karsten Roth2,3,4 Alexey Dosovitskiy5 Zeynep Akata1,2,3 1Technical University of Munich 2Munich Center of Machine Learning 3Helmholtz Munich 4University of Tübingen & Tübingen AI Center 5Inceptive
Pseudocode Yes Algorithm 1 Re NO
Open Source Code Yes Code is available at https://github.com/Explainable ML/Re NO.
Open Datasets Yes First, we evaluate on T2I-Comp Bench [37], which comprises 6000 compositional prompts spanning six categories... Second, we employ Gen Eval [28], consisting of 552 object-focused prompts... Finally, we utilize Parti-Prompts [103], a collection of more than 1600 complex prompts...
Dataset Splits No The paper evaluates on established benchmarks (T2I-Comp Bench, Gen Eval, Parti-Prompts), which imply predefined splits, but it does not explicitly state the train/validation/test dataset splits used for its own experiments (e.g., specific percentages or sample counts).
Hardware Specification Yes Then, all of the models can be optimized on a single A100 GPU in 20-50 seconds, and e.g., SD-Turbo requires only 15GB VRAM for the entire optimization process. [...] Table 6: Computational cost comparison of Re NO optimizing four reward models on an A100 GPU.
Software Dependencies No Our code is built with Pytorch [67] and is mainly based on the diffusers library [90].
Experiment Setup Yes Throughout all experiments, we optimize Equation (7) for 50 steps using gradient ascent with Nesterov momentum and gradient norm clipping for stability. [...] We use λreg = 0.01 for all our experiments. For the learning rate, we use µ = 5 for all our 512 × 512 models and µ = 10 for Hyper SDXL that generates 1024 × 1024 as we found this to give a good balance between exploration, improvements, and fast convergence.