On Discrete Prompt Optimization for Diffusion Models
Authors: Ruochen Wang, Ting Liu, Cho-Jui Hsieh, Boqing Gong
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical evaluation on prompts collected from diverse sources (Diffusion DB, Chat GPT, COCO) suggests that our method can discover prompts that substantially improve (prompt enhancement) or destroy (adversarial attack) the faithfulness of images generated by the text-to-image diffusion model. |
| Researcher Affiliation | Collaboration | 1University of California, Los Angeles 2Google Research 3Google Deepmind. |
| Pseudocode | Yes | Details of the complete DPO-Diff algorithm, including specific hyperpa-rameters, are available in Algorithm 1 of Appendix D and discussed further in Appendix F.1. (Algorithm 1 DPO-Diff solver: Discrete Prompt Optimization Algorithm) |
| Open Source Code | No | The paper does not provide an unambiguous statement or a direct link to the open-source code for the described methodology. |
| Open Datasets | Yes | To evaluate our prompt optimization method for the diffusion model, we collect and filter a set of challenging prompts from diverse sources including Diffusion DB (Wang et al., 2022), COCO (Lin et al., 2014), and Chat GPT (Ouyang et al., 2022). |
| Dataset Splits | No | The paper describes collecting a dataset of prompts for evaluation but does not specify a training/validation split for its own method, which is an optimization framework rather than a model that is trained on such splits. |
| Hardware Specification | No | The paper does not specify any particular hardware (e.g., CPU, GPU models, or cloud computing instances) used for running the experiments. |
| Software Dependencies | No | The paper mentions software like 'Stable Diffusion v1-4' and 'DDIM sampler', and 'Chat GPT (gpt-4-1106-preview)' along with general libraries like 'RMSprop' but does not provide specific version numbers for ancillary software dependencies. |
| Experiment Setup | Yes | We use Stable Diffusion v1-4 with a DDIM sampler for all experiments in the main paper. The guidance scale and inference steps are set to 7.5 and 50 respectively (default). (...) The K for the Shortcut Text Gradient is set to 1. (...) we progressively increase t from 15 to 25. (...) We use Gumbel Softmax with temperature 1. (...) We optimize DPO-Diff using RMSprop with a learning rate of 0.1 and momentum of 0.5 for 20 iterations. (...) population size = 20, tournament = top 10, mutation with prob = 0.1 and size = 10, and crossover with size = 10. |