Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Controllable Human-centric Keyframe Interpolation with Generative Prior

Authors: Zujin Guo, Size Wu, Zhongang Cai, Wei Li, Chen Change Loy

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present quantitative and qualitative results in Sec. 5.1 to validate the effectiveness of our 3D control strategy in Pose Fuse3D. We compare our interpolation performance against state-of-the-art methods in Sec. 5.2 and analyze the scalability across temporal gaps in Sec. 5.3. We further assess the in-the-wild interpolation capability in Sec. 5.4, where ground-truth control signals are not available. Finally, Sec. 5.5 provides a detailed ablation study to justify our model design.
Researcher Affiliation Collaboration Zujin Guo1 Size Wu1 Zhongang Cai2 Wei Li1 Chen Change Loy1 1S-Lab, Nanyang Technological University 2Sense Time Research
Pseudocode No The paper describes methods through prose and architectural diagrams (Figure 2) but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code No Answer: [No] Justification: We are unable to provide our code upon submission, but releasing the code to the public in the future is our plan.
Open Datasets No Answer: [No] Justification: Upon submission, we do not provide the source code and the dataset. But we will make them public in the future after acceptance.
Dataset Splits Yes To prepare the train and test split, we follow the original division for Sports Slomo [6] videos and distribute the Pexels videos according to their keyword frequencies to maintain balanced coverage of all motion categories.
Hardware Specification Yes For implementation, we leverage Fully Sharded Data Parallel (FSDP) across 4 GPUs.
Software Dependencies No We fine-tune the entire Pose Fuse3D-KI framework in an end-to-end manner using the Adam W optimizer with a learning rate of 8 10 5. The paper mentions an optimizer but does not provide specific version numbers for libraries or other software dependencies used in the experiments.
Experiment Setup Yes We fine-tune Pose Fuse3D-KI on the CHKI-Video training split for 70k iterations. Specifically, we fine-tune our 3D-informed control model Pose Fuse3D, and employ Lo RA on the input patch embeddings, as well as the value and output projections of the VDM s attention modules. During training, we randomly sample 25 consecutive frames from video clips and process them to a resolution of 512 320. For implementation, we leverage Fully Sharded Data Parallel (FSDP) across 4 GPUs. We fine-tune the entire Pose Fuse3D-KI framework in an end-to-end manner using the Adam W optimizer with a learning rate of 8 10 5. The fine-tuning is applied to our 3D-informed control model, Pose Fuse3D, with additional Lo RA adaptation on the input patch embeddings, as well as the value and output projections of the VDM s attention modules. Both the Lo RA rank and Lo RA alpha are set to 32.