reproducibilityindex.ai

Confidence-aware Reward Optimization for Fine-tuning Text-to-Image Models

Authors: Kyuyoung Kim, Jongheon Jeong, Minyong An, Mohammad Ghavamzadeh, Krishnamurthy Dj Dvijotham, Jinwoo Shin, Kimin Lee

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our evaluation of several state-of-the-art reward models on this benchmark reveals their frequent misalignment with human assessment. We empirically demonstrate that overoptimization occurs notably when a poorly aligned reward model is used as the fine-tuning objective.
Researcher Affiliation	Collaboration	1KAIST 2Korea University 3Yonsei University 4Amazon 5Google Deep Mind
Pseudocode	Yes	Algorithm 1 outlines detailed pseudocode for instructing Chat GPT to generate a contrastive prompt set based on a given input prompt x0.
Open Source Code	No	The paper provides links to the benchmark data and third-party libraries used, but does not explicitly state that the source code for their specific method (Text Norm) or their experimental setup is open-sourced.
Open Datasets	Yes	To facilitate our evaluation, we introduce the Text-Image Alignment Assessment1 (TIA2) benchmark, a diverse compilation of text prompts, images, and human annotations. ... 1The benchmark data is available at https://github.com/kykim0/Text Norm.
Dataset Splits	No	The paper mentions evaluating reward models and selecting checkpoints (e.g., 'earliest checkpoint at which the number of generated images with higher scores... reaches the maximum'), implying a form of validation. However, it does not specify a distinct validation dataset split with percentages or counts for training their models.
Hardware Specification	No	The paper mentions using Stable Diffusion v2.1 as the base text-to-image model and fine-tuning it, but it does not specify any hardware details like GPU models, CPU types, or memory used for the experiments.
Software Dependencies	No	The paper states: 'Specifically, we used the diffusers library6 for the SFT experiments and the trl library7 for the RL experiments.' However, it does not provide specific version numbers for these libraries or any other software dependencies, which are necessary for full reproducibility.
Experiment Setup	Yes	Table 4: Summary of hyperparameters used for SFT and RL fine-tuning. Parameters for Diffusion Denoising steps, Guidance scale, Optimization (Optimizer, Learning rate, Weight decay, β1, β2, ϵ, Max gradient norm, Batch size, Samples per iteration, Gradient updates per iteration, Mixed precision) are listed with specific values.