MagiCapture: High-Resolution Multi-Concept Portrait Customization

Authors: Junha Hyung, Jaeyo Shin, Jaegul Choo

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Magi Capture outperforms other baselines in both quantitative and qualitative evaluations and can also be generalized to other non-human objects. The paper includes dedicated sections for "Experiments," "Training Details," "Comparisons," and "Ablation Study," which detail empirical studies and performance metrics.
Researcher Affiliation Academia Junha Hyung*1, Jaeyo Shin*2, and Jaegul Choo1 1KAIST AI 2Sogang University {sharpeeee, jchoo}@kaist.ac.kr, tlswody123@sogang.ac.kr
Pseudocode No The paper describes its methods using prose and mathematical equations but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not explicitly state the release of its source code or provide a link to a code repository for the methodology described.
Open Datasets Yes We choose 10 identities, 7 from VGGFace (Cao et al. 2018) and 3 in-the-wild identities gathered from the internet. We also manually select 10 style concepts...
Dataset Splits No The paper mentions training steps and the number of source/reference images used (4 to 6), but it does not specify explicit train/validation/test dataset splits with percentages, sample counts, or references to predefined splits.
Hardware Specification Yes The model is trained on a single GeForce RTX 3090 GPU, using a batch size of 1 and gradient accumulation over 4 steps.
Software Dependencies No The paper mentions using 'pre-trained Stable Diffusion V1.5' but does not specify version numbers for general software dependencies such as Python, PyTorch, TensorFlow, or other libraries required for replication.
Experiment Setup Yes The first training phase consists of a total of 1200 steps, with a learning rate 5e-4 for updating the text embeddings. In the second Lo RA phase, the learning rate is 1e-4 for the projection layers and 1e-5 for the text embeddings, with a total of 1500 training steps. The model is trained on a single GeForce RTX 3090 GPU, using a batch size of 1 and gradient accumulation over 4 steps.