reproducibilityindex.ai

FuseAnyPart: Diffusion-Driven Facial Parts Swapping via Multiple Reference Images

Authors: zheng yu, Yaohua Wang, Siying Cui, Aixi Zhang, Wei-Long Zheng, Senzhang Wang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments qualitatively and quantitatively validate the superiority and robustness of Fuse Any Part. Source codes are available at https://github.com/Thomas-wyh/Fuse Any Part. 4 Experiment Dataset. We train our model on the Celeb A-HQ [11] dataset. The Celeb A-HQ dataset contains 30,000 high-resolution face images of celebrities widely used for face generation and face swapping tasks. This dataset has been pre-processed and aligned, and is available in three different resolutions. In our experiments, we use the 1024 1024 resolution. Our evaluation set is sampled from the Face Forensics++ [25] dataset, which contains 1,000 videos. We randomly sample 10 frames from each video and obtain 10,000 images.
Researcher Affiliation	Collaboration	Zheng Yu Shanghai Jiao Tong University & Alibaba Group cs-yuzheng@sjtu.edu.cn Yaohua Wang * Alibaba Group xiachen.wyh@alibaba-inc.com Siying Cui Peking University & Alibaba Group cuisiying.csy@alibaba-inc.com Aixi Zhang Alibaba Group aixi.zhax@alibaba-inc.com Wei-Long Zheng Shanghai Jiao Tong University weilong@sjtu.edu.cn Senzhang Wang Central South University szwang@csu.edu.cn
Pseudocode	No	The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Source codes are available at https://github.com/Thomas-wyh/Fuse Any Part.
Open Datasets	Yes	We train our model on the Celeb A-HQ [11] dataset. The Celeb A-HQ dataset contains 30,000 high-resolution face images of celebrities widely used for face generation and face swapping tasks. This dataset has been pre-processed and aligned, and is available in three different resolutions. In our experiments, we use the 1024 1024 resolution. Our evaluation set is sampled from the Face Forensics++ [25] dataset
Dataset Splits	No	The paper states the training and evaluation (test) datasets but does not explicitly describe a separate validation split or how it was used for hyperparameter tuning. It mentions an 'evaluation set' which is then used for quantitative comparisons, implying it serves as a test set.
Hardware Specification	Yes	We train our model on 16 NVIDIA A100 GPUs (80GB) with a batch size of 16 per GPU using the Adam W optimizer [16] with a constant learning rate of 1e-4 and weight decay of 0.01.
Software Dependencies	Yes	Our implementation is based on Hugging Face diffusers [30] library and we use Stable Diffusion v1-5 [24] and Open AI s clip-vit-large-path14 vison model [22].
Experiment Setup	Yes	We train our model on 16 NVIDIA A100 GPUs (80GB) with a batch size of 16 per GPU using the Adam W optimizer [16] with a constant learning rate of 1e-4 and weight decay of 0.01. During training, facial part reference images are randomly sampled from images with the same ID, and the target image is consistent with the face reference image. During the inference stage, we use the DDIM [28] sampler with 50 steps and set λ = 1.0. Since we do not use a text prompt, we set the text prompt to empty.