AttnDreamBooth: Towards Text-Aligned Personalized Text-to-Image Generation
Authors: Lianyu Pang, Jian Yin, Baoquan Zhao, Feize Wu, Fu Lee Wang, Qing Li, Xudong Mao
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we first present the implementation details of our method. Subsequently, we evaluate its performance by conducting a comparative analysis with four state-of-the-art personalization methods. Lastly, we conduct an ablation study to demonstrate the effectiveness of each sub-module. |
| Researcher Affiliation | Academia | 1Sun Yat-sen University 2Hong Kong Metropolitan University 3The Hong Kong Polytechnic University |
| Pseudocode | No | The paper describes its method in prose and through diagrams (e.g., Figure 3) but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The NeurIPS checklist explicitly states: 'Question: Does the paper provide open access to the data and code... Answer: [Yes] Justification: We have made the code publicly available with sufficient instructions.' Additionally, a project website link 'https://attndreambooth.github.io' is provided on the first page. |
| Open Datasets | Yes | The datasets used for evaluation are from TI [24] and DB [71]. The data from DB is under the Unsplash license, while the license information for the data from TI is not available online. |
| Dataset Splits | No | The paper describes training stages and hyperparameter tuning (e.g., learning rates, batch sizes), and mentions quantitative evaluation using 24 text prompts. However, it does not explicitly define specific training, validation, and test dataset splits by percentages or sample counts. |
| Hardware Specification | Yes | All experiments are conducted on a single Nvidia A100 GPU. |
| Software Dependencies | Yes | Our implementation is based on the publicly available Stable Diffusion V2.1 [70]. |
| Experiment Setup | Yes | We keep a fixed batch size of 8 across all training stages but vary the learning rates and training steps. Specifically, we train with a learning rate of 10 3 for 60 steps in stage 1, followed by a learning rate of 2 10 5 for 100 steps in stage 2, and conclude with a learning rate of 2 10 6 for 500 steps in stage 3. λµ and λσ are set to 0.1 and 0 in stage 1, respectively, and are adjusted to 2 and 5 in subsequent stages. |