reproducibilityindex.ai

On Discrete Prompt Optimization for Diffusion Models

Authors: Ruochen Wang, Ting Liu, Cho-Jui Hsieh, Boqing Gong

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical evaluation on prompts collected from diverse sources (Diffusion DB, Chat GPT, COCO) suggests that our method can discover prompts that substantially improve (prompt enhancement) or destroy (adversarial attack) the faithfulness of images generated by the text-to-image diffusion model.
Researcher Affiliation	Collaboration	1University of California, Los Angeles 2Google Research 3Google Deepmind.
Pseudocode	Yes	Details of the complete DPO-Diff algorithm, including speciﬁc hyperpa-rameters, are available in Algorithm 1 of Appendix D and discussed further in Appendix F.1. (Algorithm 1 DPO-Diff solver: Discrete Prompt Optimization Algorithm)
Open Source Code	No	The paper does not provide an unambiguous statement or a direct link to the open-source code for the described methodology.
Open Datasets	Yes	To evaluate our prompt optimization method for the diffusion model, we collect and ﬁlter a set of challenging prompts from diverse sources including Diffusion DB (Wang et al., 2022), COCO (Lin et al., 2014), and Chat GPT (Ouyang et al., 2022).
Dataset Splits	No	The paper describes collecting a dataset of prompts for evaluation but does not specify a training/validation split for its own method, which is an optimization framework rather than a model that is trained on such splits.
Hardware Specification	No	The paper does not specify any particular hardware (e.g., CPU, GPU models, or cloud computing instances) used for running the experiments.
Software Dependencies	No	The paper mentions software like 'Stable Diffusion v1-4' and 'DDIM sampler', and 'Chat GPT (gpt-4-1106-preview)' along with general libraries like 'RMSprop' but does not provide specific version numbers for ancillary software dependencies.
Experiment Setup	Yes	We use Stable Diffusion v1-4 with a DDIM sampler for all experiments in the main paper. The guidance scale and inference steps are set to 7.5 and 50 respectively (default). (...) The K for the Shortcut Text Gradient is set to 1. (...) we progressively increase t from 15 to 25. (...) We use Gumbel Softmax with temperature 1. (...) We optimize DPO-Diff using RMSprop with a learning rate of 0.1 and momentum of 0.5 for 20 iterations. (...) population size = 20, tournament = top 10, mutation with prob = 0.1 and size = 10, and crossover with size = 10.