Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
FaceShot: Bring Any Character into Life
Authors: Junyao Gao, Yanan Sun, Fei Shen, Xin Jiang, Zhening Xing, Kai Chen, Cai Zhao
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on our newly constructed character benchmark Charac Bench confirm that Face Shot consistently surpasses state-of-the-art (SOTA) approaches across any character domain. More results are available at our project website https://faceshot2024.github.io/faceshot/. |
| Researcher Affiliation | Collaboration | 1Tongji University, 2Shanghai AI Laboratory, 3Nanjing University of Science and Technology. EMAIL, EMAIL, EMAIL. |
| Pseudocode | No | The paper describes the framework components and their functionalities using textual explanations and mathematical equations (e.g., Eq 1, 2, 3, 5), but does not contain any explicit 'Pseudocode' or 'Algorithm' blocks with structured steps. |
| Open Source Code | No | Our code will be publicly released to encourage responsible use in areas like entertainment and education, while discouraging unethical practices, including misinformation and harassment. |
| Open Datasets | Yes | Moreover, we consider videos of human head from RAVDESS (Livingstone & Russo, 2018) as our driving videos. |
| Dataset Splits | No | In Face Shot, we use a single H800 to generate animation results. And we have included a total of 46 images and 24 driving videos in Charac Bench, with each video consisting of 110 to 127 frames. All videos (.mp4) and images (.jpg) are processed into a resolution of 512 512. |
| Hardware Specification | Yes | In Face Shot, we use a single H800 to generate animation results. And we have included a total of 46 images and 24 driving videos in Charac Bench, with each video consisting of 110 to 127 frames. |
| Software Dependencies | No | For appearance-guided landmark matching, we utilize Stable Diffusion v1.5 along with the pre-trained weights of IP-Adapter (Ye et al., 2023) to extract diffusion features from the images. |
| Experiment Setup | Yes | For appearance-guided landmark matching, we utilize Stable Diffusion v1.5 along with the pre-trained weights of IP-Adapter (Ye et al., 2023) to extract diffusion features from the images. Specifically, we set the time step t = 301, the U-Net layer l = 6, and the number of target images k = 10. Additionally, following MOFA-Video, we use N = 68 keypoints (Sagonas et al., 2016) as facial landmarks and M = 64 frames for animation. |