FaceComposer: A Unified Model for Versatile Facial Content Creation
Authors: Jiayu Wang, Kang Zhao, Yifeng Ma, Shiwei Zhang, Yingya Zhang, Yujun Shen, Deli Zhao, Jingren Zhou
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments suggest that our approach not only achieves comparable or even better performance than state-of-the-arts on each single task, but also facilitates some combined tasks with one-time forward, demonstrating its potential in serving as a foundation generative model in face domain. |
| Researcher Affiliation | Collaboration | Jiayu Wang 1, Kang Zhao 1, Yifeng Ma 2, Shiwei Zhang1, Yingya Zhang1, Yujun Shen3, Deli Zhao1, Jingren Zhou1 1Alibaba Group 2Tsinghua University 3Ant Group |
| Pseudocode | No | The paper presents a framework diagram (Figure 1) but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | No | Code, dataset, model, and interface will be made publicly available. |
| Open Datasets | Yes | To construct the image part of our database, we carefully clean up LAION-Face [60] and merge the cleaned dataset with Celeb A-HQ [16] and FFHQ [17]. ... We evaluate Face Composer on face generation, face animation and face editing tasks, which respectively using the Multi-Modal Celeb A-HQ [51], HDTF [59] + MEAD-Neutral (a subset of MEAD [45] that only contains the neutral facial expression videos)... |
| Dataset Splits | No | The paper uses various datasets for training and evaluation (LAION-Face, Celeb A-HQ, FFHQ, HDTF, MEAD-Neutral) but does not explicitly provide the specific percentages or counts for training, validation, and test splits within the paper. |
| Hardware Specification | No | The paper does not specify the exact hardware details (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions starting from a 'pre-trained LDMs*' with a general GitHub link to Stable Diffusion, but does not provide specific version numbers for any software dependencies or libraries used. |
| Experiment Setup | Yes | During the training, our model starts from a pre-trained LDMs*, and is further trained on our multi-modal face database through a joint training mechanism. ... For the LDMs, we pretrain it with 1M steps on the full multi-modal dataset using only T2F Embeddings as the condition, and then finetune the model for 200K steps with all conditions enabled. The prior model is trained for 1M steps on the image dataset. ... we set H = W = 256 in experiments. ... setting 0.5 dropout probability for each condition, 0.1 to drop all conditions, and 0.1 to reserve them all. |