reproducibilityindex.ai

Zero-shot High-fidelity and Pose-controllable Character Animation

Authors: Bingwen Zhu, Fanyi Wang, Tianyi Lu, Peng Liu, Jingwen Su, Jinxiu Liu, Yanhao Zhang, Zuxuan Wu, Guo-Jun Qi, Yu-Gang Jiang

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiment results demonstrate that our approach outperforms the state-of-the-art trainingbased methods in terms of character consistency and detail fidelity. Moreover, it maintains a high level of temporal coherence throughout the generated animations.
Researcher Affiliation	Collaboration	Bingwen Zhu1,2 , Fanyi Wang3 , Tianyi Lu1,2 , Peng Liu3 , Jingwen Su3 , Jinxiu Liu4 , Yanhao Zhang3 , Zuxuan Wu1,2 , Guo-Jun Qi5 and Yu-Gang Jiang1,2 1Shanghai Key Lab of Intell. Info. Processing, School of CS, Fudan University 2Shanghai Collaborative Innovation Center of Intelligent Visual Computing 3OPPO AI Center 4South China University Of Technology 5Westlake University
Pseudocode	Yes	Algorithm 1 Pose-aware embedding optimization. Input: Source character image Is, source character pose ps, text prompt C, and target pose sequence P = {pi}N i=1, number of frames N, timestep T. Output: Optimized source embeddings { s,t}T t=1, Optimized pose-aware embeddings {{e xi,t}T t=1}N i=1 , and latent code ZT .
Open Source Code	No	The paper mentions leveraging "the official open source code of Disco" for comparison, but it does not provide any statement or link indicating that its own methodology's source code is publicly available.
Open Datasets	Yes	To further make a comprehensive quantitative performance comparison, we also follow the experimental settings in Magic Animate, and evaluate both image fidelity and video quality on two benchmark datasets, namely Tik Tok [Jafarian and Park, 2021] and TED-talks [Siarohin et al., 2021].
Dataset Splits	No	The paper states: "For quantitative analysis, we first randomly sample 50 in-the-wild image-text pairs and 10 different disered pose sequences to conduct evaluations," but does not provide specific details on training, validation, or test data splits for reproducibility.
Hardware Specification	Yes	All experiments are performed on a single NVIDIA A100 GPU.
Software Dependencies	Yes	We implement Pose Animate based on the public pre-trained weights of Control Net [Zhang et al., 2023] and Stable Diffusion [Rombach et al., 2022] v1.5.
Experiment Setup	Yes	For each generated character animation, we generate N = 16 frames with a unified 512 512 resolution. In the experiment, we use DDIM sampler [Song et al., 2020] with the default hyperparameters: number of diffusion steps T = 50 and guidance scale w = 7.5. For the pose-aware control module, loss function of optimizing text embedding text is MSE. The optimization iterations are 250 in total with n = 5 inner iterations per step, and the optimizer is Adam.