Optimizing Prompts for Text-to-Image Generation
Authors: Yaru Hao, Zewen Chi, Li Dong, Furu Wei
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on Stable Diffusion show that our method outperforms manual prompt engineering in terms of both automatic metrics and human preference ratings. |
| Researcher Affiliation | Industry | Yaru Hao , Zewen Chi , Li Dong, Furu Wei Microsoft Research https://github.com/microsoft/LMOps |
| Pseudocode | No | No pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | No | The pretrained checkpoints are available at https://aka.ms/promptist. The demo can be found at https://aka.ms/promptist-demo. The paper does not explicitly state that the source code for the methodology is released. |
| Open Datasets | Yes | We use three types of data: (1) in-domain prompts from Diffusion DB [Wang et al., 2022]... (2) out-of-domain image captions from COCO dataset [Chen et al., 2015], (3) image labels from Image Net-21k [Deng et al., 2009]... |
| Dataset Splits | No | The paper mentions 'validation loss' during fine-tuning, but does not provide explicit train/validation/test dataset splits (e.g., percentages or counts) for the datasets used. |
| Hardware Specification | Yes | Our experiments are implemented on V100 (32GB) GPU. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies such as Python, PyTorch, or other libraries used in the implementation. |
| Experiment Setup | Yes | We use a batch size of 256, a learning rate of 5e-5, and a max length of 512. We finetune the model for 15k steps... We train the policy for 12k episodes, four PPO epochs per batch with one minibatch each, with a batch size of 256 and a constant learning rate of 5e-5. The value loss coefficient and the KL reward coefficient are kept at 2.3 and 0.2 respectively. |