SCube: Instant Large-Scale Scene Reconstruction using VoxSplats
Authors: Xuanchi Ren, Yifan Lu, hanxue liang, Jay Zhangjie Wu, Huan Ling, Mike Chen, Sanja Fidler, Francis Williams, Jiahui Huang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our performance on the Waymo Open Dataset [53] on the challenging task of reconstructing a scene from sparse images with low overlap. We show that SCube significantly outperforms existing methods on this task. and 4 Experiments In this section, we validate the effectiveness of SCube. First, we present our new data curation pipeline that produces ground-truth voxel grids ( 4.1). Next, we demonstrate SCube s capabilities in scene reconstruction ( 4.2), and further highlight its usefulness in assisting the state-of-the-art Gaussian splatting pipeline ( 4.3). Finally, we showcase other applications of our method ( 4.4) and perform ablation studies to justify our design choices ( 4.5). |
| Researcher Affiliation | Collaboration | Xuanchi Ren1,2,3 ,Yifan Lu1,4 , Hanxue Liang1,5, Zhangjie Wu1,6, Huan Ling1,2,3, Mike Chen1, Sanja Fidler1,2,3, Francis Williams1, Jiahui Huang1 1NVIDIA, 2University of Toronto, 3Vector Institute, 4Shanghai Jiao Tong University 5University of Cambridge, 6National University of Singapore |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks (clearly labeled algorithm sections or code-like formatted procedures). |
| Open Source Code | No | Due to institutional constraints, we are not able to release the code until the paper is fully accepted. Upon acceptance, we will release all code and data required to reproduce this work. |
| Open Datasets | Yes | We evaluate our performance on the Waymo Open Dataset [53] on the challenging task of reconstructing a scene from sparse images with low overlap. |
| Dataset Splits | Yes | Our dataset contains 20243 chunks for training and 5380 chunks for evaluation, out of the 798 training and 202 validation sequences. |
| Hardware Specification | Yes | We train both coarse-level and fine-level voxel latent diffusion models with 64 NVIDIA Tesla A100s for 2 days. For the appearance reconstruction model, we train it using 8 NVIDIA Tesla A100s for 2 days. |
| Software Dependencies | No | We train all of our models using the Adam [24] optimizer with β1 = 0.9 and β1 = 0.999. We use Py Torch Lightning [10] for building our distributed training framework. |
| Experiment Setup | Yes | Empirically, we use λ = 1.0 for LDepth in Eq (2). Additionally, we use λ1 = 0.9, λ2 = 1.0, λSSIM = 0.1 and λLPIPS = 0.6 in Eq (6). For image condition, we set the feature channel C = 32, the number of depth bins D = 64, znear = 0.1 and zfar = 90.0. |