Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Resolving Multi-Condition Confusion for Finetuning-Free Personalized Image Generation

Authors: Qihan Huang, Siming Fu, Jinlong Liu, Hao Jiang, Yipeng Yu, Jie Song

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experiments verify that our method achieves superior performance to the state-of-the-arts on the Concept101 dataset and Dream Booth dataset of multi-object personalized image generation, and remarkably improves the performance on single-object personalized image generation. We perform comprehensive experiments to validate the performance of our proposed framework. Experiment results demonstrate that with only 100,000 high-quality images (0.13% of the dataset from Subject Diffusion) selected from SA-1B, our model achieves state-of-the-art performance on the Concept101 dataset and Dream Booth dataset of multi-object personalized image generation.
Researcher Affiliation Collaboration Qihan Huang*1, 2, Siming Fu*2, Jinlong Liu2, Hao Jiang2, Yipeng Yu2, Jie Song1 1 Zhejiang University 2 Alibaba Group EMAIL, EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode No The paper describes methods using text and mathematical formulas and presents overall frameworks in diagrams (Figure 3), but it does not contain clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code https://github.com/hqhQAQ/MIP-Adapter
Open Datasets Yes Specifically, this dataset is constructed from the open-sourced SA-1B dataset (Kirillov et al. 2023) consisting of about 11 million images with multiple objects. ...our model achieves state-of-the-art performance on both the Concept101 dataset and Dream Booth dataset of multi-object personalized image generation.
Dataset Splits No This work trains the pre-trained finetuning-free model with the weighted-merge method on a multi-object dataset. Specifically, this dataset is constructed from the open-sourced SA-1B dataset (Kirillov et al. 2023) ... then utilize 100,000 images with the highest Sobject quality for training.
Hardware Specification No During training, we adopt Adam W optimizer with a learning rate of 1e-4, and train the model on 8 PPUs for 30,000 steps with a batch size of 4 per PPU.
Software Dependencies No The paper mentions using 'sdxl model (Podell et al. 2023)' and 'sdxl plus (Jaegle et al. 2021) model' as diffusion models, and 'Open CLIP Vi T-big G/14 as the image encoder', along with 'Adam W optimizer'. However, no specific version numbers for underlying software libraries like Python, PyTorch, or CUDA are provided.
Experiment Setup Yes During training, we adopt Adam W optimizer with a learning rate of 1e-4, and train the model on 8 PPUs for 30,000 steps with a batch size of 4 per PPU. To enable classifier-free guidance, we use a probability of 0.05 to drop text and image individually, and a probability of 0.05 to drop text and image simultaneously.