Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
MultiBooth: Towards Generating All Your Concepts in an Image from Text
Authors: Chenyang Zhu, Kai Li, Yue Ma, Chunming He, Xiu Li
AAAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Multi Booth surpasses various baselines in both qualitative and quantitative evaluations, showcasing its superior performance and computational efficiency. Our approach is extensively validated with various representative subjects, including pets, objects, scenes, etc. The results from both qualitative and quantitative comparisons highlight the advantages of our approach in terms of concept fidelity and prompt alignment capability. We conduct comparisons between our method and four existing methods: Textual Inversion (TI) (Gal et al. 2022), Dream Booth (DB) (Ruiz et al. 2023), Custom Diffusion (CD) (Kumari et al. 2023), and Cones2 (Liu et al. 2023). Quantitative comparison. We assess all the methods using three evaluation metrics: CLIP-I, Seg CLIP-I, and CLIP-T. |
| Researcher Affiliation | Collaboration | 1 Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, China 2 Meta Platforms, Inc., USA 3 The Hong Kong University of Science and Technology, Hong Kong 4 Duke University, Durham, USA |
| Pseudocode | No | The paper describes the methodology in prose and includes equations and figures, but it does not contain a clearly labeled 'Pseudocode' or 'Algorithm' block, nor structured steps formatted like code. |
| Open Source Code | No | The paper does not contain an explicit statement about releasing source code, nor does it provide a link to a code repository. |
| Open Datasets | Yes | Datasets. Following Custom Diffusion (Kumari et al. 2023), we conduct experiments on twelve subjects selected from the Dream Booth dataset (Ruiz et al. 2023) and Custom Concept101 (Kumari et al. 2023). |
| Dataset Splits | No | The paper mentions selecting text prompts from CLIP Image Net templates and following Textual Inversion for training, but it does not provide specific details regarding the training/test/validation splits for the image datasets used. |
| Hardware Specification | Yes | All of our experiments are based on Stable Diffusion v1.5 and are conducted on one RTX3090. |
| Software Dependencies | No | The paper mentions 'Stable Diffusion v1.5' as the foundational model and techniques like 'Lo RA' and 'QFormer', but it does not specify any ancillary software libraries, packages, or solvers with their version numbers that would be required to replicate the experiments. |
| Experiment Setup | Yes | During training, we optimize for 900 steps with a learning rate of 8 10 5. During inference, we sample for 100 steps with the guidance scale ω = 7.5. |