Sparse-view Pose Estimation and Reconstruction via Analysis by Generative Synthesis

Authors: Qitao Zhao, Shubham Tulsiani

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate our framework across real-world and synthetic datasets in combination with several off-the-shelf pose estimation systems as initialization. We find that it significantly improves the base systems pose accuracy while yielding high-quality 3D reconstructions that outperform the results from current multi-view reconstruction baselines.
Researcher Affiliation Academia Qitao Zhao Shubham Tulsiani Carnegie Mellon University
Pseudocode No The paper describes the overall framework and components (MV-Dream Gaussian, GD, outlier identification) in narrative text and using equations, but does not include a formal pseudocode block or an algorithm listing.
Open Source Code No We, unfortunately, are not able to clear data and codes before the camera-ready deadline. However, we will release our code as well as the data used soon.
Open Datasets Yes We primarily evaluate our method on a real-world multi-view object-centric dataset NAVI [9]. This dataset includes high-quality foreground masks, precise camera poses, and 3D meshes. For each of the 35 objects in NAVI, we randomly select 5 multi-view sequences for pose estimation and reconstruction. Additionally, we assess our method on synthetic datasets, including GSO [7], ABO [4], and Omni Object3D [44].
Dataset Splits No The paper mentions using datasets like NAVI, GSO, ABO, and Omni Object3D for evaluation and conducting experiments with varying numbers of input images (N = 6, 8, 10, 16). However, it does not explicitly state the train/validation/test splits (e.g., percentages or sample counts) for these datasets within the paper's experimental setup.
Hardware Specification Yes For 8-image inference using a single RTX A5000 GPU... We used 8 V100 GPUs, setting a batch size of 36 per GPU with a gradient accumulation of 6.
Software Dependencies No The paper mentions using specific models like "Zero-1-to-3 [20]" and "Dream Gaussian [36]" and initializing weights from "Zero123-XL checkpoint [5]". However, it does not provide version numbers for general software dependencies such as Python, PyTorch, or CUDA.
Experiment Setup Yes The first stage optimizes 3D Gaussians [14] (parameterized by θ) using a combination of photometric loss (Eq. 1, except that the camera pose is not optimized) and SDS loss (Eq. 3) with a view-conditioned diffusion model, Zero-1-to-3 [20]... This stage efficiently builds the geometry of the object with rough texture, which takes 500 training steps (in about 1 minute). In the second stage, 3D Gaussians are converted to a textured mesh with Marching Cubes [21], and only its texture is optimized. This stage takes another 50 steps and can finish within 30 seconds on a single GPU. ... For the outlier condition specified in inequality 8, we employ LPIPS as the reprojection error metric, applying a threshold of 0.05. The reconstruction loop terminates when the average reprojection error reduction falls below this threshold or if the number of estimated inliers drops below a predefined count. Specifically, we use a threshold of 4 inliers for N = 6 and N = 8, 6 inliers for N = 10, and 12 inliers for N = 16.