AP-Adapter: Improving Generalization of Automatic Prompts on Unseen Text-to-Image Diffusion Models

Authors: Yuchen Fu, Zhiwei Jiang, Yuliang Liu, Cong Wang, Zexuan Deng, Zhaoling Chen, Qing Gu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We curate a multi-modal, multi-model dataset that includes multiple diffusion models and their corresponding text-image data, and conduct experiments under a model generalization setting. The experimental results demonstrate the AP-Adapter s ability to enable the automatic prompts to generalize well to previously unseen diffusion models, generating high-quality images.
Researcher Affiliation Academia State Key Laboratory for Novel Software Technology, Nanjing University, China yuchenfu@smail.nju.edu.cn, jzw@nju.edu.cn {yuliangliu,cw,dengzx,zhaolingchen}@smail.nju.edu.cn guq@nju.edu.cn
Pseudocode Yes Algorithm 1 Training pipeline
Open Source Code No Our contributions include the dataset we collected and the code for model training and testing. We will release the data and code after the paper is accepted.
Open Datasets No Data Collection.We sourced high-quality images and personalized SD checkponts from the CIVITAI community. We collected 47,695 image-text pairs gathered from various checkpoints, ensuring privacy protection. Further analysis of our dataset is provided in the Appendix B.1.
Dataset Splits Yes The source domain encompasses 7075 samples, whereas the target domain comprises 3064 samples.
Hardware Specification Yes In the Prototype-Based Prompt Adaptation stage, all models are trained on two NVIDIA RTX 3090 GPUs, with steps set to 10000, batch size set to 16, and image resolution set to 512.
Software Dependencies Yes As for the platform to implement our network, we use Py Torch 2.1.
Experiment Setup Yes During the training phase, we retrieve 5 pairs of natural language prompts and manually designed prompts as demonstrations for ICL from the dataset. [...] For the model s parameter settings, since the source domain data contains 40 checkpoints, the number of domain prototypes S is set to 40. The coefficients γ1, γ2, γ3, γ4 for the loss functions are 0.01, 1.0, 0.001 and 1.0, respectively. [...] with steps set to 10000, batch size set to 16, and image resolution set to 512.