Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Multi-LoRA Composition for Image Generation

Authors: Ming Zhong, yelong shen, Shuohang Wang, Yadong Lu, Yizhu Jiao, Siru Ouyang, Donghan Yu, Jiawei Han, Weizhu Chen

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To evaluate the proposed approaches, we establish Compos Lo RA, a new comprehensive testbed as part of this research. It features a diverse range of Lo RA categories with 480 composition sets. Utilizing an evaluation framework based on GPT-4V, our findings demonstrate a clear improvement in performance with our methods over the prevalent baseline, particularly evident when increasing the number of Lo RAs in a composition. The code, benchmarks, Lo RA weights, and all evaluation details are available on our project website.
Researcher Affiliation Collaboration Ming Zhong1 EMAIL Yelong Shen2 EMAIL Shuohang Wang2 EMAIL Yadong Lu2 EMAIL Yizhu Jiao1 EMAIL Siru Ouyang1 EMAIL Donghan Yu2 EMAIL Jiawei Han1 EMAIL Weizhu Chen2 EMAIL 1University of Illinois Urbana-Champaign, 2Microsoft Corporation
Pseudocode No The paper describes methods like Lo RA Switch and Lo RA Composite in detail but does not present them within a formal pseudocode or algorithm block. The procedures are explained in paragraph form, using mathematical equations to describe the core logic.
Open Source Code Yes The code, benchmarks, Lo RA weights, and all evaluation details are available on our project website.
Open Datasets Yes Experimentally, we introduce Compos Lo RA, the first testbed specifically designed for Lo RA-based composable image generation. This testbed builds upon a collection of public Lo RAs1, which are extensively shared and recognized as essential plug-in modules in this field. 1Collected from https://civitai.com/.
Dataset Splits No The paper describes the composition sets used for evaluation within the Compos Lo RA testbed (e.g., "48 sets comprising 2 Lo RAs, 144 sets with 3 Lo RAs, 192 sets featuring 4 Lo RAs, and 96 sets containing 5 Lo RAs"). However, it does not specify traditional train/test/validation splits for model training or for the evaluation of the composition methods, as the methods are training-free and operate on pre-trained Lo RAs.
Hardware Specification Yes Since the proposed methods do not require additional training, all experiments are conducted on a single A6000 GPU.
Software Dependencies No For our experiments, we employ stable-diffusion-v1.5 (Rombach et al., 2022) as the backbone model. We utilize two specific checkpoints for our experiments: Realistic_Vision_V5.1 for realistic images and Counterfeit-V2.5 for anime images, each fine-tuned to their respective styles... The DPM-Solver++ (Lu et al., 2022a;b) is used as the scheduler in the generation process. The paper mentions software tools and models but does not provide specific version numbers for software dependencies beyond the model name (stable-diffusion-v1.5) and scheduler (DPM-Solver++).
Experiment Setup Yes For our experiments, we employ stable-diffusion-v1.5 (Rombach et al., 2022) as the backbone model. We utilize two specific checkpoints for our experiments: Realistic_Vision_V5.1 for realistic images and Counterfeit-V2.5 for anime images, each fine-tuned to their respective styles. In the realistic style subset, we configure the model with 100 denoising steps, a guidance scale s of 7, and set the image size to 1024x768, optimizing for superior image quality. For the anime style subset, the settings differ slightly with 200 denoising steps, a guidance scale s of 10, and an image size of 512x512. The DPM-Solver++ (Lu et al., 2022a;b) is used as the scheduler in the generation process. The weight scale w is consistently set at 0.8 for composing Lo RAs within Compos Lo RA. For the Lo RA Switch approach, we apply a cycle with τ set to 5, meaning every 5 denoising steps activate the next Lo RA in the sequence: character, clothing, style, background, then object. To ensure the reliability of our experimental results, we conduct image generation using three random seeds.