Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models

Authors: Yuchao Gu, Xintao Wang, Jay Zhangjie Wu, Yujun Shi, Yunpeng Chen, Zihan Fan, WUYOU XIAO, Rui Zhao, Shuning Chang, Weijia Wu, Yixiao Ge, Ying Shan, Mike Zheng Shou

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that Mix-of-Show is capable of composing multiple customized concepts with high fidelity, including characters, objects, and scenes.
Researcher Affiliation Collaboration Yuchao Gu1, Xintao Wang3, Jay Zhangjie Wu1, Yujun Shi2, Yunpeng Chen2, Zihan Fan2, Wuyou Xiao2, Rui Zhao1, Shuning Chang1, Weijia Wu1, Yixiao Ge3, Ying Shan3, Mike Zheng Shou1 1Show Lab, 2National University of Singapore 3ARC Lab, Tencent PCG
Pseudocode No The paper describes its methods verbally and with diagrams (e.g., Figure 4: Pipeline of Mix-of-Show), but it does not include any formal pseudocode or algorithm blocks.
Open Source Code No The paper provides a link to a project webpage (https://showlab.github.io/Mix-of-Show) but does not explicitly state that the source code for the described methodology is available at this link, nor is it a direct link to a code repository.
Open Datasets No To conduct evaluation for Mix-of-Show, we collect a dataset containing characters, objects, and scenes. The paper does not provide any specific link, DOI, or formal citation for this collected dataset to indicate its public availability.
Dataset Splits No The paper mentions collecting a dataset for evaluation but does not specify any training, validation, or test dataset splits (e.g., percentages or sample counts).
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU models, CPU specifications, memory) used for running the experiments. It only states that "More details are provided in the supplementary," but these details are not available in the main paper.
Software Dependencies No The paper mentions using "Adam [48] optimizer" and "LBFGS optimizer [49]" but does not specify version numbers for these optimizers or for any other key software components like programming languages, libraries (e.g., PyTorch, TensorFlow), or operating systems. It states "More details are provided in the supplementary," but these are not in the main text.
Experiment Setup Yes For ED-Lo RA tuning, we incorporate Lo RA layer into the linear layer in all attention modules of the text encoder and Unet, with a rank of r = 4 in all experiments. We use the Adam [48] optimizer with a learning rate of 1e-3, 1e-5 and 1e-4 for tuning text embedding, text encoder and Unet, respectively. For gradient fusion, we use the LBFGS optimizer [49] with 500 and 50 steps to optimize the text encoder and Unet, respectively.