Customizable Image Synthesis with Multiple Subjects
Authors: Zhiheng Liu, Yifei Zhang, Yujun Shen, Kecheng Zheng, Kai Zhu, Ruili Feng, Yu Liu, Deli Zhao, Jingren Zhou, Yang Cao
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Both qualitative and quantitative experimental results demonstrate our superiority over state-of-the-art alternatives under a variety of settings for multi-subject customization. Project page can be found here. ... 4 Experiments ... 4.1 Experimental setups ... 4.2 Main results ... 4.4 Ablation studies |
| Researcher Affiliation | Collaboration | 1USTC 2SJTU 3Ant Group 4Alibaba Group |
| Pseudocode | Yes | Algorithm 1 N-Subject Customization with Layout Guidance |
| Open Source Code | Yes | Project page can be found here. |
| Open Datasets | Yes | For fair and unbiased evaluation, we select subjects from previous papers [15, 18, 16, 17] spanning various categories for a total of 15 customized subjects. |
| Dataset Splits | No | The paper describes training and testing procedures but does not explicitly provide details on a validation dataset split or strategy. |
| Hardware Specification | Yes | All experiments are conducted using one A-100 GPU. |
| Software Dependencies | No | The paper mentions using 'Stable Diffusion v2-1-base' as the pre-trained model and refers to 'huggingface [39]' for implementations, but it does not provide specific version numbers for software dependencies like Python, PyTorch, or the Hugging Face libraries themselves. |
| Experiment Setup | Yes | Textual Inversion [18]... batch size of 4 and a learning rate of 0.002 for 3000 steps. ... Ours. ... batch size of 1 and a learning rate of 1 10 6 for 3,000 steps. At inference time... We use a positive value of +2.5 to strengthen the signal of the target subject and we use a negative value of 1 10 5 to weaken the signal of irrelevant subjects. Furthermore, we guide all 50 steps with the layout guidance in the whole generation process to get good customized generation results. |