Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

PoseCrafter: Extreme Pose Estimation with Hybrid Video Synthesis

Authors: Qing Mao, Tianxin Huang, Yu Zhu, Jinqiu Sun, Yanning Zhang, Gim Hee Lee

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on Cambridge Landmarks, Scan Net, DL3DV-10K, and NAVI demonstrate that, compared to existing SOTA methods, Pose Crafter can obviously enhance the pose estimation performances, especially on examples with small or no overlap. Extensive experiments on common benchmarks, including Cambridge Landmarks [11], Scan Net [12], DL3DV-10K [13], and NAVI [14] show that Pose Crafter can obviously improve the accuracy of pose estimation on extreme pose image pairs with small or no overlaps, without any requirements for additional training or ground-truth supervision. Section 4 is dedicated to 'Experiments' including 'Experiment Setup', 'Implementation Details', 'Comparison with State-of-the-Art', 'Runtime and Memory Cost Discussion', and 'Ablation Studies'.
Researcher Affiliation Academia 1School of Computer Science, Northwestern Polytechnical University 2School of Computing, National University of Singapore 3School of Computing and Data Science, The University of Hong Kong 4School of Astronautics, Northwestern Polytechnical University
Pseudocode No The paper describes the method using diagrams (e.g., Figure 1: Overview of the Pose Crafter pipeline) and textual descriptions in Section 3 'Our Method', but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code No The paper header contains 'https://github.com/maoqingsunny/Pose Crafter', however, in the NeurIPS Paper Checklist, the authors state: 'Justification: We will open my code if our work is accepted.' This indicates the code is not yet publicly available at the time of submission.
Open Datasets Yes Extensive experiments on common benchmarks, including Cambridge Landmarks [11], Scan Net [12], DL3DV-10K [13], and NAVI [14] show that Pose Crafter can obviously improve the accuracy of pose estimation on extreme pose image pairs with small or no overlaps, without any requirements for additional training or ground-truth supervision.
Dataset Splits No For Cambridge Landmarks and Scan Net, we select test pairs by sampling images whose relative yaw difference falls into two ranges ( [50 65 ] and [65 90 ]) to evaluate performance under small and no overlap cases. For the object-centric Navi and DL3DV-10K datasets, due to the large object overlap, we adopt a single yaw range of [50 90 ] following Inter Pose s setting. This describes the selection criteria for test pairs but does not specify full training/validation splits or their percentages/counts.
Hardware Specification Yes All experiments were conducted on a single NVIDIA RTX 6000 GPU to ensure a consistent evaluation environment.
Software Dependencies No The paper mentions using pre-trained models like Dynami Crafter [6], View Crafter [9], and DUSt3R [3], but does not specify versions for general software dependencies such as programming languages (e.g., Python), deep learning frameworks (e.g., PyTorch), or CUDA.
Experiment Setup Yes In the hybrid video generation stage, we first generate 16 interpolated frames between each input image pair using Dynami Crafter. From these, we select 4 frames as reliable relay frames (as described in section 3.2), which are then used by View Crafter to render 25-frame sequences for subsequent selection with Feature Matching Selector (FMS). In FMS, we extract 2,000 ORB keypoints per frame and compute RANSAC inlier counts with respect to both the start and end keyframes. The generated frames are then ranked based on their inlier counts, and the top k = 6 frames(excluding the input image pair) are selected for the final pose estimation.