reproducibilityindex.ai

FaceComposer: A Unified Model for Versatile Facial Content Creation

Authors: Jiayu Wang, Kang Zhao, Yifeng Ma, Shiwei Zhang, Yingya Zhang, Yujun Shen, Deli Zhao, Jingren Zhou

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments suggest that our approach not only achieves comparable or even better performance than state-of-the-arts on each single task, but also facilitates some combined tasks with one-time forward, demonstrating its potential in serving as a foundation generative model in face domain.
Researcher Affiliation	Collaboration	Jiayu Wang 1, Kang Zhao 1, Yifeng Ma 2, Shiwei Zhang1, Yingya Zhang1, Yujun Shen3, Deli Zhao1, Jingren Zhou1 1Alibaba Group 2Tsinghua University 3Ant Group
Pseudocode	No	The paper presents a framework diagram (Figure 1) but does not include structured pseudocode or algorithm blocks.
Open Source Code	No	Code, dataset, model, and interface will be made publicly available.
Open Datasets	Yes	To construct the image part of our database, we carefully clean up LAION-Face [60] and merge the cleaned dataset with Celeb A-HQ [16] and FFHQ [17]. ... We evaluate Face Composer on face generation, face animation and face editing tasks, which respectively using the Multi-Modal Celeb A-HQ [51], HDTF [59] + MEAD-Neutral (a subset of MEAD [45] that only contains the neutral facial expression videos)...
Dataset Splits	No	The paper uses various datasets for training and evaluation (LAION-Face, Celeb A-HQ, FFHQ, HDTF, MEAD-Neutral) but does not explicitly provide the specific percentages or counts for training, validation, and test splits within the paper.
Hardware Specification	No	The paper does not specify the exact hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies	No	The paper mentions starting from a 'pre-trained LDMs*' with a general GitHub link to Stable Diffusion, but does not provide specific version numbers for any software dependencies or libraries used.
Experiment Setup	Yes	During the training, our model starts from a pre-trained LDMs*, and is further trained on our multi-modal face database through a joint training mechanism. ... For the LDMs, we pretrain it with 1M steps on the full multi-modal dataset using only T2F Embeddings as the condition, and then finetune the model for 200K steps with all conditions enabled. The prior model is trained for 1M steps on the image dataset. ... we set H = W = 256 in experiments. ... setting 0.5 dropout probability for each condition, 0.1 to drop all conditions, and 0.1 to reserve them all.