Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

MultiBooth: Towards Generating All Your Concepts in an Image from Text

Authors: Chenyang Zhu, Kai Li, Yue Ma, Chunming He, Xiu Li

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Multi Booth surpasses various baselines in both qualitative and quantitative evaluations, showcasing its superior performance and computational efficiency. Our approach is extensively validated with various representative subjects, including pets, objects, scenes, etc. The results from both qualitative and quantitative comparisons highlight the advantages of our approach in terms of concept fidelity and prompt alignment capability. We conduct comparisons between our method and four existing methods: Textual Inversion (TI) (Gal et al. 2022), Dream Booth (DB) (Ruiz et al. 2023), Custom Diffusion (CD) (Kumari et al. 2023), and Cones2 (Liu et al. 2023). Quantitative comparison. We assess all the methods using three evaluation metrics: CLIP-I, Seg CLIP-I, and CLIP-T.
Researcher Affiliation Collaboration 1 Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, China 2 Meta Platforms, Inc., USA 3 The Hong Kong University of Science and Technology, Hong Kong 4 Duke University, Durham, USA
Pseudocode No The paper describes the methodology in prose and includes equations and figures, but it does not contain a clearly labeled 'Pseudocode' or 'Algorithm' block, nor structured steps formatted like code.
Open Source Code No The paper does not contain an explicit statement about releasing source code, nor does it provide a link to a code repository.
Open Datasets Yes Datasets. Following Custom Diffusion (Kumari et al. 2023), we conduct experiments on twelve subjects selected from the Dream Booth dataset (Ruiz et al. 2023) and Custom Concept101 (Kumari et al. 2023).
Dataset Splits No The paper mentions selecting text prompts from CLIP Image Net templates and following Textual Inversion for training, but it does not provide specific details regarding the training/test/validation splits for the image datasets used.
Hardware Specification Yes All of our experiments are based on Stable Diffusion v1.5 and are conducted on one RTX3090.
Software Dependencies No The paper mentions 'Stable Diffusion v1.5' as the foundational model and techniques like 'Lo RA' and 'QFormer', but it does not specify any ancillary software libraries, packages, or solvers with their version numbers that would be required to replicate the experiments.
Experiment Setup Yes During training, we optimize for 900 steps with a learning rate of 8 10 5. During inference, we sample for 100 steps with the guidance scale ω = 7.5.