Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Sparse-view Pose Estimation and Reconstruction via Analysis by Generative Synthesis
Authors: Qitao Zhao, Shubham Tulsiani
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate our framework across real-world and synthetic datasets in combination with several off-the-shelf pose estimation systems as initialization. We find that it significantly improves the base systems pose accuracy while yielding high-quality 3D reconstructions that outperform the results from current multi-view reconstruction baselines. |
| Researcher Affiliation | Academia | Qitao Zhao Shubham Tulsiani Carnegie Mellon University |
| Pseudocode | No | The paper describes the overall framework and components (MV-Dream Gaussian, GD, outlier identification) in narrative text and using equations, but does not include a formal pseudocode block or an algorithm listing. |
| Open Source Code | No | We, unfortunately, are not able to clear data and codes before the camera-ready deadline. However, we will release our code as well as the data used soon. |
| Open Datasets | Yes | We primarily evaluate our method on a real-world multi-view object-centric dataset NAVI [9]. This dataset includes high-quality foreground masks, precise camera poses, and 3D meshes. For each of the 35 objects in NAVI, we randomly select 5 multi-view sequences for pose estimation and reconstruction. Additionally, we assess our method on synthetic datasets, including GSO [7], ABO [4], and Omni Object3D [44]. |
| Dataset Splits | No | The paper mentions using datasets like NAVI, GSO, ABO, and Omni Object3D for evaluation and conducting experiments with varying numbers of input images (N = 6, 8, 10, 16). However, it does not explicitly state the train/validation/test splits (e.g., percentages or sample counts) for these datasets within the paper's experimental setup. |
| Hardware Specification | Yes | For 8-image inference using a single RTX A5000 GPU... We used 8 V100 GPUs, setting a batch size of 36 per GPU with a gradient accumulation of 6. |
| Software Dependencies | No | The paper mentions using specific models like "Zero-1-to-3 [20]" and "Dream Gaussian [36]" and initializing weights from "Zero123-XL checkpoint [5]". However, it does not provide version numbers for general software dependencies such as Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | The first stage optimizes 3D Gaussians [14] (parameterized by θ) using a combination of photometric loss (Eq. 1, except that the camera pose is not optimized) and SDS loss (Eq. 3) with a view-conditioned diffusion model, Zero-1-to-3 [20]... This stage efficiently builds the geometry of the object with rough texture, which takes 500 training steps (in about 1 minute). In the second stage, 3D Gaussians are converted to a textured mesh with Marching Cubes [21], and only its texture is optimized. This stage takes another 50 steps and can finish within 30 seconds on a single GPU. ... For the outlier condition specified in inequality 8, we employ LPIPS as the reprojection error metric, applying a threshold of 0.05. The reconstruction loop terminates when the average reprojection error reduction falls below this threshold or if the number of estimated inliers drops below a predefined count. Specifically, we use a threshold of 4 inliers for N = 6 and N = 8, 6 inliers for N = 10, and 12 inliers for N = 16. |