Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Resolving Multi-Condition Confusion for Finetuning-Free Personalized Image Generation
Authors: Qihan Huang, Siming Fu, Jinlong Liu, Hao Jiang, Yipeng Yu, Jie Song
AAAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experiments verify that our method achieves superior performance to the state-of-the-arts on the Concept101 dataset and Dream Booth dataset of multi-object personalized image generation, and remarkably improves the performance on single-object personalized image generation. We perform comprehensive experiments to validate the performance of our proposed framework. Experiment results demonstrate that with only 100,000 high-quality images (0.13% of the dataset from Subject Diffusion) selected from SA-1B, our model achieves state-of-the-art performance on the Concept101 dataset and Dream Booth dataset of multi-object personalized image generation. |
| Researcher Affiliation | Collaboration | Qihan Huang*1, 2, Siming Fu*2, Jinlong Liu2, Hao Jiang2, Yipeng Yu2, Jie Song1 1 Zhejiang University 2 Alibaba Group EMAIL, EMAIL, EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper describes methods using text and mathematical formulas and presents overall frameworks in diagrams (Figure 3), but it does not contain clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code https://github.com/hqhQAQ/MIP-Adapter |
| Open Datasets | Yes | Specifically, this dataset is constructed from the open-sourced SA-1B dataset (Kirillov et al. 2023) consisting of about 11 million images with multiple objects. ...our model achieves state-of-the-art performance on both the Concept101 dataset and Dream Booth dataset of multi-object personalized image generation. |
| Dataset Splits | No | This work trains the pre-trained finetuning-free model with the weighted-merge method on a multi-object dataset. Specifically, this dataset is constructed from the open-sourced SA-1B dataset (Kirillov et al. 2023) ... then utilize 100,000 images with the highest Sobject quality for training. |
| Hardware Specification | No | During training, we adopt Adam W optimizer with a learning rate of 1e-4, and train the model on 8 PPUs for 30,000 steps with a batch size of 4 per PPU. |
| Software Dependencies | No | The paper mentions using 'sdxl model (Podell et al. 2023)' and 'sdxl plus (Jaegle et al. 2021) model' as diffusion models, and 'Open CLIP Vi T-big G/14 as the image encoder', along with 'Adam W optimizer'. However, no specific version numbers for underlying software libraries like Python, PyTorch, or CUDA are provided. |
| Experiment Setup | Yes | During training, we adopt Adam W optimizer with a learning rate of 1e-4, and train the model on 8 PPUs for 30,000 steps with a batch size of 4 per PPU. To enable classifier-free guidance, we use a probability of 0.05 to drop text and image individually, and a probability of 0.05 to drop text and image simultaneously. |