Confidence-aware Reward Optimization for Fine-tuning Text-to-Image Models
Authors: Kyuyoung Kim, Jongheon Jeong, Minyong An, Mohammad Ghavamzadeh, Krishnamurthy Dj Dvijotham, Jinwoo Shin, Kimin Lee
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our evaluation of several state-of-the-art reward models on this benchmark reveals their frequent misalignment with human assessment. We empirically demonstrate that overoptimization occurs notably when a poorly aligned reward model is used as the fine-tuning objective. |
| Researcher Affiliation | Collaboration | 1KAIST 2Korea University 3Yonsei University 4Amazon 5Google Deep Mind |
| Pseudocode | Yes | Algorithm 1 outlines detailed pseudocode for instructing Chat GPT to generate a contrastive prompt set based on a given input prompt x0. |
| Open Source Code | No | The paper provides links to the benchmark data and third-party libraries used, but does not explicitly state that the source code for their specific method (Text Norm) or their experimental setup is open-sourced. |
| Open Datasets | Yes | To facilitate our evaluation, we introduce the Text-Image Alignment Assessment1 (TIA2) benchmark, a diverse compilation of text prompts, images, and human annotations. ... 1The benchmark data is available at https://github.com/kykim0/Text Norm. |
| Dataset Splits | No | The paper mentions evaluating reward models and selecting checkpoints (e.g., 'earliest checkpoint at which the number of generated images with higher scores... reaches the maximum'), implying a form of validation. However, it does not specify a distinct validation dataset split with percentages or counts for training their models. |
| Hardware Specification | No | The paper mentions using Stable Diffusion v2.1 as the base text-to-image model and fine-tuning it, but it does not specify any hardware details like GPU models, CPU types, or memory used for the experiments. |
| Software Dependencies | No | The paper states: 'Specifically, we used the diffusers library6 for the SFT experiments and the trl library7 for the RL experiments.' However, it does not provide specific version numbers for these libraries or any other software dependencies, which are necessary for full reproducibility. |
| Experiment Setup | Yes | Table 4: Summary of hyperparameters used for SFT and RL fine-tuning. Parameters for Diffusion Denoising steps, Guidance scale, Optimization (Optimizer, Learning rate, Weight decay, β1, β2, ϵ, Max gradient norm, Batch size, Samples per iteration, Gradient updates per iteration, Mixed precision) are listed with specific values. |