DisenBooth: Identity-Preserving Disentangled Tuning for Subject-Driven Text-to-Image Generation
Authors: Hong Chen, Yipeng Zhang, Simin Wu, Xin Wang, Xuguang Duan, Yuwei Zhou, Wenwu Zhu
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that our proposed Disen Booth framework outperforms baseline models for subject-driven text-to-image generation with the identity-preserved embedding. |
| Researcher Affiliation | Academia | 1Department of Computer Science and Technology, Tsinghua University 2Beijing National Research Center for Information Science and Technology 3Lanzhou University |
| Pseudocode | No | The paper describes its methods through text and equations but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | 1Our code is available at https://github.com/forchchch/Disen Booth |
| Open Datasets | Yes | We adopt the subject-driven text-to-image generation dataset Dream Bench proposed by Ruiz et al. (2022), which are downloaded from Unsplash2. This dataset contains 30 subjects, including unique objects like backpacks, stuffed animals, cats, etc. |
| Dataset Splits | No | The paper describes using a small set of images for finetuning (3-5 images per subject) and the Dream Bench dataset for evaluation, but it does not specify explicit train/validation/test splits for the Dream Bench dataset or for the finetuning process in general. |
| Hardware Specification | Yes | The finetuning process is conducted on one Tesla V100 with batch size of 1, while the finetuning iterations are 3,000. |
| Software Dependencies | No | The paper mentions implementing based on 'Stable Diffusion 2-1' and using 'Adam W' optimizer, but it does not provide specific version numbers for software dependencies like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | The learning rate is 1e-4 with the Adam W (Loshchilov & Hutter, 2018) optimizer. The finetuning process is conducted on one Tesla V100 with batch size of 1, while the finetuning iterations are 3,000. As for the Lo RA rank, we use r = 4 for all the experiments. We use λ2 = 0.01 for all our experiments. λ3 is a hyper-parameter which is set to 0.001 for all our experiments. |