AP-Adapter: Improving Generalization of Automatic Prompts on Unseen Text-to-Image Diffusion Models
Authors: Yuchen Fu, Zhiwei Jiang, Yuliang Liu, Cong Wang, Zexuan Deng, Zhaoling Chen, Qing Gu
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We curate a multi-modal, multi-model dataset that includes multiple diffusion models and their corresponding text-image data, and conduct experiments under a model generalization setting. The experimental results demonstrate the AP-Adapter s ability to enable the automatic prompts to generalize well to previously unseen diffusion models, generating high-quality images. |
| Researcher Affiliation | Academia | State Key Laboratory for Novel Software Technology, Nanjing University, China yuchenfu@smail.nju.edu.cn, jzw@nju.edu.cn {yuliangliu,cw,dengzx,zhaolingchen}@smail.nju.edu.cn guq@nju.edu.cn |
| Pseudocode | Yes | Algorithm 1 Training pipeline |
| Open Source Code | No | Our contributions include the dataset we collected and the code for model training and testing. We will release the data and code after the paper is accepted. |
| Open Datasets | No | Data Collection.We sourced high-quality images and personalized SD checkponts from the CIVITAI community. We collected 47,695 image-text pairs gathered from various checkpoints, ensuring privacy protection. Further analysis of our dataset is provided in the Appendix B.1. |
| Dataset Splits | Yes | The source domain encompasses 7075 samples, whereas the target domain comprises 3064 samples. |
| Hardware Specification | Yes | In the Prototype-Based Prompt Adaptation stage, all models are trained on two NVIDIA RTX 3090 GPUs, with steps set to 10000, batch size set to 16, and image resolution set to 512. |
| Software Dependencies | Yes | As for the platform to implement our network, we use Py Torch 2.1. |
| Experiment Setup | Yes | During the training phase, we retrieve 5 pairs of natural language prompts and manually designed prompts as demonstrations for ICL from the dataset. [...] For the model s parameter settings, since the source domain data contains 40 checkpoints, the number of domain prototypes S is set to 40. The coefficients γ1, γ2, γ3, γ4 for the loss functions are 0.01, 1.0, 0.001 and 1.0, respectively. [...] with steps set to 10000, batch size set to 16, and image resolution set to 512. |