FlowFace: Semantic Flow-Guided Shape-Aware Face Swapping

Authors: Hao Zeng, Wei Zhang, Changjie Fan, Tangjie Lv, Suzhen Wang, Zhimeng Zhang, Bowen Ma, Lincheng Li, Yu Ding, Xin Yu

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive quantitative and qualitative experiments on in-the-wild faces demonstrate that our Flow Face outperforms the state-of-the-art significantly.
Researcher Affiliation Collaboration Hao Zeng1, Wei Zhang1, Changjie Fan1, Tangjie Lv1, Suzhen Wang1, Zhimeng Zhang1, Bowen Ma1, Lincheng Li1, Yu Ding1,3,*, Xin Yu2 1Virtual Human Group, Netease Fuxi AI Lab 2University of Technology Sydney 3Zhejiang University
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No More details are in the supplementary materials and our codes will be made publicly available upon publication of the paper.
Open Datasets Yes The training dataset is collected from three commonly-used face datasets: Celeb A-HQ (Karras et al. 2017), FFHQ (Karras, Laine, and Aila 2019), and VGGFace2 (Cao et al. 2018).
Dataset Splits Yes The final dataset contains 350K face images, and 10K images are randomly sampled as the validation dataset. For the comparison experiments, we construct the test set by sampling Face Forensics++(FF++) (R ossler et al. 2019), following (Li et al. 2019). Specifically, FF++ consists of 1000 video clips, and the test set is collected by sampling ten frames from each clip of FF++, in a total of 10000 images.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running experiments.
Software Dependencies No The paper mentions software components like Adam optimizer, pre-trained face recognition models, and VGG, but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes Our Flow Face is trained in a two-stage manner. Specifically, F res is first trained for 32K steps with a batch size of eight. As for F swa, we first pre-trained the face encoder following the training strategy of MAE on our face dataset. Then we fix the encoder and train other components of F swa for 640K steps with a batch size of eight. We adopt Adam (Kingma and Ba 2014) optimizer with β1=0 and β2=0.99 and the learning rate is set to 0.0001.