Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

FaceShot: Bring Any Character into Life

Authors: Junyao Gao, Yanan Sun, Fei Shen, Xin Jiang, Zhening Xing, Kai Chen, Cai Zhao

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on our newly constructed character benchmark Charac Bench confirm that Face Shot consistently surpasses state-of-the-art (SOTA) approaches across any character domain. More results are available at our project website https://faceshot2024.github.io/faceshot/.
Researcher Affiliation Collaboration 1Tongji University, 2Shanghai AI Laboratory, 3Nanjing University of Science and Technology. EMAIL, EMAIL, EMAIL.
Pseudocode No The paper describes the framework components and their functionalities using textual explanations and mathematical equations (e.g., Eq 1, 2, 3, 5), but does not contain any explicit 'Pseudocode' or 'Algorithm' blocks with structured steps.
Open Source Code No Our code will be publicly released to encourage responsible use in areas like entertainment and education, while discouraging unethical practices, including misinformation and harassment.
Open Datasets Yes Moreover, we consider videos of human head from RAVDESS (Livingstone & Russo, 2018) as our driving videos.
Dataset Splits No In Face Shot, we use a single H800 to generate animation results. And we have included a total of 46 images and 24 driving videos in Charac Bench, with each video consisting of 110 to 127 frames. All videos (.mp4) and images (.jpg) are processed into a resolution of 512 512.
Hardware Specification Yes In Face Shot, we use a single H800 to generate animation results. And we have included a total of 46 images and 24 driving videos in Charac Bench, with each video consisting of 110 to 127 frames.
Software Dependencies No For appearance-guided landmark matching, we utilize Stable Diffusion v1.5 along with the pre-trained weights of IP-Adapter (Ye et al., 2023) to extract diffusion features from the images.
Experiment Setup Yes For appearance-guided landmark matching, we utilize Stable Diffusion v1.5 along with the pre-trained weights of IP-Adapter (Ye et al., 2023) to extract diffusion features from the images. Specifically, we set the time step t = 301, the U-Net layer l = 6, and the number of target images k = 10. Additionally, following MOFA-Video, we use N = 68 keypoints (Sagonas et al., 2016) as facial landmarks and M = 64 frames for animation.