SCube: Instant Large-Scale Scene Reconstruction using VoxSplats

Authors: Xuanchi Ren, Yifan Lu, hanxue liang, Jay Zhangjie Wu, Huan Ling, Mike Chen, Sanja Fidler, Francis Williams, Jiahui Huang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our performance on the Waymo Open Dataset [53] on the challenging task of reconstructing a scene from sparse images with low overlap. We show that SCube significantly outperforms existing methods on this task. and 4 Experiments In this section, we validate the effectiveness of SCube. First, we present our new data curation pipeline that produces ground-truth voxel grids ( 4.1). Next, we demonstrate SCube s capabilities in scene reconstruction ( 4.2), and further highlight its usefulness in assisting the state-of-the-art Gaussian splatting pipeline ( 4.3). Finally, we showcase other applications of our method ( 4.4) and perform ablation studies to justify our design choices ( 4.5).
Researcher Affiliation Collaboration Xuanchi Ren1,2,3 ,Yifan Lu1,4 , Hanxue Liang1,5, Zhangjie Wu1,6, Huan Ling1,2,3, Mike Chen1, Sanja Fidler1,2,3, Francis Williams1, Jiahui Huang1 1NVIDIA, 2University of Toronto, 3Vector Institute, 4Shanghai Jiao Tong University 5University of Cambridge, 6National University of Singapore
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks (clearly labeled algorithm sections or code-like formatted procedures).
Open Source Code No Due to institutional constraints, we are not able to release the code until the paper is fully accepted. Upon acceptance, we will release all code and data required to reproduce this work.
Open Datasets Yes We evaluate our performance on the Waymo Open Dataset [53] on the challenging task of reconstructing a scene from sparse images with low overlap.
Dataset Splits Yes Our dataset contains 20243 chunks for training and 5380 chunks for evaluation, out of the 798 training and 202 validation sequences.
Hardware Specification Yes We train both coarse-level and fine-level voxel latent diffusion models with 64 NVIDIA Tesla A100s for 2 days. For the appearance reconstruction model, we train it using 8 NVIDIA Tesla A100s for 2 days.
Software Dependencies No We train all of our models using the Adam [24] optimizer with β1 = 0.9 and β1 = 0.999. We use Py Torch Lightning [10] for building our distributed training framework.
Experiment Setup Yes Empirically, we use λ = 1.0 for LDepth in Eq (2). Additionally, we use λ1 = 0.9, λ2 = 1.0, λSSIM = 0.1 and λLPIPS = 0.6 in Eq (6). For image condition, we set the feature channel C = 32, the number of depth bins D = 64, znear = 0.1 and zfar = 90.0.