Aligning Diffusion Models by Optimizing Human Utility

Authors: Shufan Li, Konstantinos Kallidromitis, Akash Gokul, Yusuke Kato, Kazuki Kozuka

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We comprehensively evaluate Diffusion-KTO through quantitative and qualitative analyses to demonstrate its effectiveness in aligning text-to-image diffusion models with a preference distribution.
Researcher Affiliation Collaboration Shufan Li1 Konstantinos Kallidromitis2 Akash Gokul3 * Yusuke Kato2 Kazuki Kozuka2 1University of California, Los Angeles 2Panasonic AI Research 3Salesforce AI Research
Pseudocode No The paper describes the methodology using equations and textual explanations, but it does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Code is available at https://github.com/jacklishufan/diffusion-kto
Open Datasets Yes We fine-tune Stable Diffusion v1-5 (SD v1-5) [39] (Creative ML Open RAIL-M license) with the Diffusion-KTO objective, using the Kahneman-Tversky utility function, on the Pick-a-Pic v2 dataset [27] (MIT license).
Dataset Splits No The paper describes how the Pick-a-Pic dataset was partitioned into desirable and undesirable samples for training (237,530 desirable and 690,538 undesirable samples) and mentions specific test sets (Pick-a-Pic, HPS v2, Parti Prompts) for evaluation. However, it does not explicitly provide details about a validation dataset split used for hyperparameter tuning during training.
Hardware Specification Yes We train Stable Diffusion v1-5 (SD v1-5) on 4 NVIDIA A6000 GPUs with a batch size of 2 per GPU using the Adam optimizer.
Software Dependencies No The paper mentions various models and datasets used (e.g., Stable Diffusion v1-5, Pick-a-Pic v2, CLIP, HPS v2), but it does not specify explicit version numbers for software dependencies like Python, PyTorch, or other specific libraries used in the implementation.
Experiment Setup Yes We train Stable Diffusion v1-5 (SD v1-5) ... with a batch size of 2 per GPU using the Adam optimizer. We use a base learning rate of 1e-7 with 1000 warm-up steps for a total of 10000 iterations. We set β to 5000.