reproducibilityindex.ai

DisenBooth: Identity-Preserving Disentangled Tuning for Subject-Driven Text-to-Image Generation

Authors: Hong Chen, Yipeng Zhang, Simin Wu, Xin Wang, Xuguang Duan, Yuwei Zhou, Wenwu Zhu

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments show that our proposed Disen Booth framework outperforms baseline models for subject-driven text-to-image generation with the identity-preserved embedding.
Researcher Affiliation	Academia	1Department of Computer Science and Technology, Tsinghua University 2Beijing National Research Center for Information Science and Technology 3Lanzhou University
Pseudocode	No	The paper describes its methods through text and equations but does not include any explicit pseudocode or algorithm blocks.
Open Source Code	Yes	1Our code is available at https://github.com/forchchch/Disen Booth
Open Datasets	Yes	We adopt the subject-driven text-to-image generation dataset Dream Bench proposed by Ruiz et al. (2022), which are downloaded from Unsplash2. This dataset contains 30 subjects, including unique objects like backpacks, stuffed animals, cats, etc.
Dataset Splits	No	The paper describes using a small set of images for finetuning (3-5 images per subject) and the Dream Bench dataset for evaluation, but it does not specify explicit train/validation/test splits for the Dream Bench dataset or for the finetuning process in general.
Hardware Specification	Yes	The finetuning process is conducted on one Tesla V100 with batch size of 1, while the finetuning iterations are 3,000.
Software Dependencies	No	The paper mentions implementing based on 'Stable Diffusion 2-1' and using 'Adam W' optimizer, but it does not provide specific version numbers for software dependencies like Python, PyTorch, or CUDA.
Experiment Setup	Yes	The learning rate is 1e-4 with the Adam W (Loshchilov & Hutter, 2018) optimizer. The finetuning process is conducted on one Tesla V100 with batch size of 1, while the finetuning iterations are 3,000. As for the Lo RA rank, we use r = 4 for all the experiments. We use λ2 = 0.01 for all our experiments. λ3 is a hyper-parameter which is set to 0.001 for all our experiments.