Aligning Diffusion Models by Optimizing Human Utility
Authors: Shufan Li, Konstantinos Kallidromitis, Akash Gokul, Yusuke Kato, Kazuki Kozuka
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We comprehensively evaluate Diffusion-KTO through quantitative and qualitative analyses to demonstrate its effectiveness in aligning text-to-image diffusion models with a preference distribution. |
| Researcher Affiliation | Collaboration | Shufan Li1 Konstantinos Kallidromitis2 Akash Gokul3 * Yusuke Kato2 Kazuki Kozuka2 1University of California, Los Angeles 2Panasonic AI Research 3Salesforce AI Research |
| Pseudocode | No | The paper describes the methodology using equations and textual explanations, but it does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/jacklishufan/diffusion-kto |
| Open Datasets | Yes | We fine-tune Stable Diffusion v1-5 (SD v1-5) [39] (Creative ML Open RAIL-M license) with the Diffusion-KTO objective, using the Kahneman-Tversky utility function, on the Pick-a-Pic v2 dataset [27] (MIT license). |
| Dataset Splits | No | The paper describes how the Pick-a-Pic dataset was partitioned into desirable and undesirable samples for training (237,530 desirable and 690,538 undesirable samples) and mentions specific test sets (Pick-a-Pic, HPS v2, Parti Prompts) for evaluation. However, it does not explicitly provide details about a validation dataset split used for hyperparameter tuning during training. |
| Hardware Specification | Yes | We train Stable Diffusion v1-5 (SD v1-5) on 4 NVIDIA A6000 GPUs with a batch size of 2 per GPU using the Adam optimizer. |
| Software Dependencies | No | The paper mentions various models and datasets used (e.g., Stable Diffusion v1-5, Pick-a-Pic v2, CLIP, HPS v2), but it does not specify explicit version numbers for software dependencies like Python, PyTorch, or other specific libraries used in the implementation. |
| Experiment Setup | Yes | We train Stable Diffusion v1-5 (SD v1-5) ... with a batch size of 2 per GPU using the Adam optimizer. We use a base learning rate of 1e-7 with 1000 warm-up steps for a total of 10000 iterations. We set β to 5000. |