Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Stable Part Diffusion 4D: Multi-View RGB and Kinematic Parts Video Generation

Authors: Hao Helen Zhang, Chun-Han Yao, Simon Donné, Narendra Ahuja, Varun Jampani

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show that SP4D generalizes strongly to diverse scenarios, including real-world videos, novel generated objects, and rare articulated poses, producing kinematic-aware outputs suitable for downstream animation and motion-related tasks. We conduct comprehensive experiments to evaluate the effectiveness of our method, including comparisons with state-of-the-art approaches for part segmentation, as well as ablation studies on key design choices. We report both quantitative metrics (m Io U, ARI, F1 Score, m Acc) in table 1 and a user study in table 2 to assess quality from a rigging perspective.
Researcher Affiliation Collaboration Hao Zhang1,2 Chun-Han Yao1 Simon Donné1 Narendra Ahuja2 Varun Jampani1 1Stability AI 2University of Illinois Urbana-Champaign
Pseudocode No The paper describes the methodology in prose and mathematical equations but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code No Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [No] Justification: Unfortunately, we cannot commit to releasing the code at this time.
Open Datasets No To train and evaluate SP4D, we construct Kinematic Parts20K, a curated dataset of over 20K rigged objects selected and processed from Objaverse XL (Deitke et al., 2023), each paired with multi-view RGB and part video sequences. We curate Kinematic Parts20K, a large-scale dataset of over 20K rigged objects with paired RGB and part video annotations to support training and evaluation.
Dataset Splits No Table 1: Quantitative comparison of kinematic parts on Kinematic Parts20K val set for multi-view (static object) and multi-frame (static camera). To quantitatively assess this gap, we conduct a comprehensive evaluation on the Kinematic Parts20K test set using these SOTA 3D segmentation baselines. While the paper mentions 'val set' and 'test set' for Kinematic Parts20K, it does not provide specific details on the splitting methodology, percentages, or sample counts for these splits.
Hardware Specification Yes Training is performed on 32 NVIDIA H100 GPUs with an effective batch size of 32, using 12 views and 4 frames per object sampled from the rendered dataset.
Software Dependencies No Our model is implemented by directly extending the SV4D 2.0 framework (Yao et al., 2025). We adopt the EDM (Karras et al., 2022) training framework with an L2 loss and precompute VAE latents and CLIP features for all training images to accelerate convergence. While frameworks and components are mentioned, specific version numbers for software, libraries, or programming languages are not provided.
Experiment Setup Yes We train the full SP4D model with Bi Di Fuse and our proposed contrastive part consistency loss on the Kinematic Parts20K dataset (as discussed below) for 40K iterations. Training is performed on 32 NVIDIA H100 GPUs with an effective batch size of 32, using 12 views and 4 frames per object sampled from the rendered dataset.