MagiCapture: High-Resolution Multi-Concept Portrait Customization
Authors: Junha Hyung, Jaeyo Shin, Jaegul Choo
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Magi Capture outperforms other baselines in both quantitative and qualitative evaluations and can also be generalized to other non-human objects. The paper includes dedicated sections for "Experiments," "Training Details," "Comparisons," and "Ablation Study," which detail empirical studies and performance metrics. |
| Researcher Affiliation | Academia | Junha Hyung*1, Jaeyo Shin*2, and Jaegul Choo1 1KAIST AI 2Sogang University {sharpeeee, jchoo}@kaist.ac.kr, tlswody123@sogang.ac.kr |
| Pseudocode | No | The paper describes its methods using prose and mathematical equations but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not explicitly state the release of its source code or provide a link to a code repository for the methodology described. |
| Open Datasets | Yes | We choose 10 identities, 7 from VGGFace (Cao et al. 2018) and 3 in-the-wild identities gathered from the internet. We also manually select 10 style concepts... |
| Dataset Splits | No | The paper mentions training steps and the number of source/reference images used (4 to 6), but it does not specify explicit train/validation/test dataset splits with percentages, sample counts, or references to predefined splits. |
| Hardware Specification | Yes | The model is trained on a single GeForce RTX 3090 GPU, using a batch size of 1 and gradient accumulation over 4 steps. |
| Software Dependencies | No | The paper mentions using 'pre-trained Stable Diffusion V1.5' but does not specify version numbers for general software dependencies such as Python, PyTorch, TensorFlow, or other libraries required for replication. |
| Experiment Setup | Yes | The first training phase consists of a total of 1200 steps, with a learning rate 5e-4 for updating the text embeddings. In the second Lo RA phase, the learning rate is 1e-4 for the projection layers and 1e-5 for the text embeddings, with a total of 1500 training steps. The model is trained on a single GeForce RTX 3090 GPU, using a batch size of 1 and gradient accumulation over 4 steps. |