Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

MimicMotion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance

Authors: Yuang Zhang, Jiaxi Gu, Li-Wen Wang, Han Wang, Junqi Cheng, Yuefeng Zhu, Fangyuan Zou

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments demonstrate the effectiveness of our approach in producing high-quality human motion videos. Videos and comparisons are available at https://tencent.github.io/MimicMotion.
Researcher Affiliation	Collaboration	1Tencent 2Shanghai Jiao Tong University. Correspondence to: Jiaxi Gu <EMAIL>.
Pseudocode	Yes	Algorithm 1 Progressive latent fusion for long videos.
Open Source Code	No	The abstract states: "Videos and comparisons are available at https://tencent.github.io/MimicMotion." This link points to a project demonstration page, not a specific code repository for the methodology described in the paper. No other explicit statement about code release or repository link for the authors' own method was found.
Open Datasets	Yes	We evaluate performance on test sequences from the Tik Tok dataset (Jafarian & Park, 2021). Jafarian, Y. and Park, H. S. Learning high fidelity depths of dressed humans by watching social media dance videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12753 12762, June 2021.
Dataset Splits	Yes	For the testing protocol of previous works (Wang et al., 2023; Chang et al., 2023), we adopt the Tik Tok (Jafarian & Park, 2021) dataset and use sequence 335 to 340 for our evaluation.
Hardware Specification	Yes	We train our model on 8 NVIDIA A100 GPUs for 20 epochs, with a batch size of 8 and 16 frames per clip.
Software Dependencies	No	No specific software dependencies with version numbers (e.g., Python version, library versions) are mentioned in the paper.
Experiment Setup	Yes	We train our model on 8 NVIDIA A100 GPUs for 20 epochs, with a batch size of 8 and 16 frames per clip. The loss weight of the hand region is 10. The learning rate is 10^-5 with a linear warmup of 500 iterations. We tune all parameters in the UNet and Pose Net. We follow Stable video diffusion and adopt the noise distribution, i.e. log σ N(Pmean, P 2 std), proposed by Karras et al (Karras et al., 2022) with parameter Pmean = 0.5 and Pstd = 1.4.