Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models
Authors: Yuchao Gu, Xintao Wang, Jay Zhangjie Wu, Yujun Shi, Yunpeng Chen, Zihan Fan, WUYOU XIAO, Rui Zhao, Shuning Chang, Weijia Wu, Yixiao Ge, Ying Shan, Mike Zheng Shou
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that Mix-of-Show is capable of composing multiple customized concepts with high fidelity, including characters, objects, and scenes. |
| Researcher Affiliation | Collaboration | Yuchao Gu1, Xintao Wang3, Jay Zhangjie Wu1, Yujun Shi2, Yunpeng Chen2, Zihan Fan2, Wuyou Xiao2, Rui Zhao1, Shuning Chang1, Weijia Wu1, Yixiao Ge3, Ying Shan3, Mike Zheng Shou1 1Show Lab, 2National University of Singapore 3ARC Lab, Tencent PCG |
| Pseudocode | No | The paper describes its methods verbally and with diagrams (e.g., Figure 4: Pipeline of Mix-of-Show), but it does not include any formal pseudocode or algorithm blocks. |
| Open Source Code | No | The paper provides a link to a project webpage (https://showlab.github.io/Mix-of-Show) but does not explicitly state that the source code for the described methodology is available at this link, nor is it a direct link to a code repository. |
| Open Datasets | No | To conduct evaluation for Mix-of-Show, we collect a dataset containing characters, objects, and scenes. The paper does not provide any specific link, DOI, or formal citation for this collected dataset to indicate its public availability. |
| Dataset Splits | No | The paper mentions collecting a dataset for evaluation but does not specify any training, validation, or test dataset splits (e.g., percentages or sample counts). |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU models, CPU specifications, memory) used for running the experiments. It only states that "More details are provided in the supplementary," but these details are not available in the main paper. |
| Software Dependencies | No | The paper mentions using "Adam [48] optimizer" and "LBFGS optimizer [49]" but does not specify version numbers for these optimizers or for any other key software components like programming languages, libraries (e.g., PyTorch, TensorFlow), or operating systems. It states "More details are provided in the supplementary," but these are not in the main text. |
| Experiment Setup | Yes | For ED-Lo RA tuning, we incorporate Lo RA layer into the linear layer in all attention modules of the text encoder and Unet, with a rank of r = 4 in all experiments. We use the Adam [48] optimizer with a learning rate of 1e-3, 1e-5 and 1e-4 for tuning text embedding, text encoder and Unet, respectively. For gradient fusion, we use the LBFGS optimizer [49] with 500 and 50 steps to optimize the text encoder and Unet, respectively. |