Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Motion4D: Learning 3D-Consistent Motion and Semantics for 4D Scene Understanding

Authors: Haoran Zhou, Gim Hee Lee

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive evaluations demonstrate that our Motion4D significantly outperforms both 2D foundation models and existing 3D-based approaches across diverse scene understanding tasks, including point-based tracking, video object segmentation, and novel view synthesis. Our code is available at https://hrzhou2.github.io/motion4d-web/. 39th Conference on Neural Information Processing Systems (Neur IPS 2025). 4 Experiments We evaluate Motion4D across diverse tasks, including video object segmentation, point-based tracking, and novel view synthesis, to demonstrate its ability to model motion and semantics in dynamic scenes.
Researcher Affiliation Academia Haoran Zhou Department of Computer Science National University of Singapore EMAIL Gim Hee Lee Department of Computer Science National University of Singapore EMAIL
Pseudocode No The paper describes the method and optimization pipeline through textual descriptions and figures (e.g., Figure 2: Overview of Motion4D), but does not contain explicitly labeled pseudocode blocks or algorithms.
Open Source Code Yes Our code is available at https://hrzhou2.github.io/motion4d-web/.
Open Datasets Yes Introducing Dy Check-VOS for Video Object Segmentation. We create a new VOS benchmark to evaluate segmentation performance in realistic and dynamic scenes by manually annotating the Dy Check dataset [12] with high-quality per-frame object masks. ... We also evaluate our method on the DAVIS dataset [30]. ... The DAVIS dataset, introduced for 2D point tracking by TAP-Vid [7], provides sparsely annotated point trajectories across real-world video sequences.
Dataset Splits Yes We use the DAVIS 2017 validation set and follow the standard setting of semi-supervised video object segmentation by providing the ground-truth object masks on the first frame as input. ... The Dy Check [12] dataset provides annotations of 5 to 15 keypoints sampled at equally spaced time steps for each sequence.
Hardware Specification No The main paper does not explicitly provide specific hardware details such as GPU/CPU models or memory specifications. The NeurIPS checklist states: 'We provide details about the compute resources used in the supplementary material alongside other implementation details.', implying this information is not in the main text.
Software Dependencies No The main paper does not explicitly list software dependencies with specific version numbers. The NeurIPS checklist (Question 6) indicates that 'full training and testing details' are provided in the 'supplemental material', suggesting they are not in the main paper.
Experiment Setup Yes The overall loss is defined as L = λrgb Lrgb + λsem Lsem + λtrack Ltrack + λdepth Ldepth + λw Lw, where each λ is a hyperparameter to balance the loss terms. ... we compute per-pixel errors ergb(p) and esem(p) for the RGB and semantic views, respectively. We then select regions where ergb(p) > θrgb or esem(p) > θsem.