Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

EverybodyDance: Bipartite Graph–Based Identity Correspondence for Multi-Character Animation

Authors: Haotian Ling, Zequn Chen, Qiuying Chen, Donglin Di, Yongjia Ma, Hao Li, Chen Wei, Zhulin Tao, Xun Yang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that Everybody Dance substantially outperforms state-of-the-art baselines in both IC and visual fidelity. Our approach significantly outperforms SOTA baselines in both IC accuracy and visual fidelity. We evaluate our method, Everybody Dance, on the ICE-Bench using both quantitative metrics and qualitative showcases.
Researcher Affiliation	Collaboration	1University of Science and Technology of China 2Li Auto 3Communication University of China 4Mo E Key Laboratory of Brain-inspired Intelligent Perception and Cognition, USTC EMAIL, EMAIL EMAIL EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1 Training Pipeline of Everybody Dance ... Algorithm 2 Inference Pipeline of Everybody Dance
Open Source Code	No	We will release the code after publication.
Open Datasets	Yes	To validate the generalizability of our method, we also conducted comparisons with the publicly available benchmark provided by Follow-Your-Pose-V2 [50]. This benchmark is distinguished by frequent inter-person occlusions. ... we benchmarked its performance on challenging multicharacter videos containing 3 to 5 individuals, and on the widely-used single-character Tik Tok [62] benchmark.
Dataset Splits	No	The paper mentions that for Stage 2, training sequences comprise 24 consecutive frames sampled at 3-frame intervals and that the ICE-bench contains 3,200 video frames, but does not provide specific train/test/validation split percentages or sample counts for the datasets used.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, memory, or processor types. It only generally states in the acknowledgments that Li Auto provided "essential computational resources".
Software Dependencies	No	The paper mentions several tools and models like VAE [66], CLIP [67], Adam [68], DDIM [20], DWPose [54], and SAM2 [52], but it does not specify version numbers for any underlying programming languages, libraries, or frameworks (e.g., Python, PyTorch, CUDA).
Experiment Setup	Yes	Stage 1 was trained for 5,000 steps while Stage 2 was trained for 1,000 steps, with the VAE [66] encoder and CLIP [67] encoder frozen throughout. In Stage 1, the Reference Net, Denoising Net, and Pose Guider are trainable. A batch size of 32 is applied, using center-cropped 768 × 768 resolution images. For Stage 2, only the temporal attention modules are trainable, with a batch size of 8 and training sequences comprising 24 consecutive frames sampled at 3-frame intervals. Video frames are processed at 512 × 512 resolution. We set the learning rate at 2.0 e 5 and use the Adam [68] optimizer. During the inference phase, we employ the DDIM scheduler with 50 denoising steps. We set the classifier-free guidance [69] scale to 3.5.