TPC: Test-time Procrustes Calibration for Diffusion-based Human Image Animation
Authors: Sunjae Yoon, Gwanhyeong Koo, Younghwan Lee, Chang Yoo
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Current diffusion-based image animation systems exhibit high precision in transferring human identity into targeted motion, yet they still exhibit irregular quality in their outputs. Their optimal precision is achieved only when the physical compositions (i.e., scale and rotation) of the human shapes in the reference image and target pose frame are aligned. In the absence of such alignment, there is a noticeable decline in fidelity and consistency. Especially, in real-world environments, this compositional misalignment commonly occurs, posing significant challenges to the practical usage of current systems. To this end, we propose Test-time Procrustes Calibration (TPC), which enhances the robustness of diffusion-based image animation systems by maintaining optimal performance even when faced with compositional misalignment, effectively addressing real-world scenarios. The TPC provides a calibrated reference image for the diffusion model, enhancing its capability to understand the correspondence between human shapes in the reference and target images. Our method is simple and can be applied to any diffusion-based image animation system in a model-agnostic manner, improving the effectiveness at test time without additional training. |
| Researcher Affiliation | Academia | Sunjae Yoon Gwanhyeong Koo Younghwan Lee Chang D. Yoo Korea Advanced Institute of Science and Technology (KAIST) {sunjae.yoon,cd_yoo}@kaist.ac.kr |
| Pseudocode | Yes | A.1 Algorithm for Iterative Propagtion Algorithm 1 Iterative Propagation |
| Open Source Code | No | The paper states in the NeurIPS checklist that it submits 'source links for data used in our experiment and also a demo video,' but does not explicitly state that the source code for the proposed TPC method is provided for open access. |
| Open Datasets | Yes | We validate human image animation on two popular benchmarks (i.e., Tik Tok [14], TED-talks [25]) about a test split. |
| Dataset Splits | No | Due to no validation splits, we provide valid sets matching the test set sizes for the ablation study. |
| Hardware Specification | Yes | We use Stable Diffusion 1.5 [23] for all baselines on 4 NVIDIA A100 GPUs. |
| Software Dependencies | No | SAM [16] is used for screening out the background in calibrated images. VQ-VAE [30] is used for encoding images of the video. We use Stable Diffusion 1.5 [23] for all baselines. |
| Experiment Setup | Yes | The number of groups in iterative propagation is chosen as M = 30 under ablation studies in Table 2. The average number of video frames is about 120. We follow the same pose encoders and image encoders of baseline models. |