reproducibilityindex.ai

SCube: Instant Large-Scale Scene Reconstruction using VoxSplats

Authors: Xuanchi Ren, Yifan Lu, hanxue liang, Jay Zhangjie Wu, Huan Ling, Mike Chen, Sanja Fidler, Francis Williams, Jiahui Huang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our performance on the Waymo Open Dataset [53] on the challenging task of reconstructing a scene from sparse images with low overlap. We show that SCube significantly outperforms existing methods on this task. and 4 Experiments In this section, we validate the effectiveness of SCube. First, we present our new data curation pipeline that produces ground-truth voxel grids ( 4.1). Next, we demonstrate SCube s capabilities in scene reconstruction ( 4.2), and further highlight its usefulness in assisting the state-of-the-art Gaussian splatting pipeline ( 4.3). Finally, we showcase other applications of our method ( 4.4) and perform ablation studies to justify our design choices ( 4.5).
Researcher Affiliation	Collaboration	Xuanchi Ren1,2,3 ,Yifan Lu1,4 , Hanxue Liang1,5, Zhangjie Wu1,6, Huan Ling1,2,3, Mike Chen1, Sanja Fidler1,2,3, Francis Williams1, Jiahui Huang1 1NVIDIA, 2University of Toronto, 3Vector Institute, 4Shanghai Jiao Tong University 5University of Cambridge, 6National University of Singapore
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks (clearly labeled algorithm sections or code-like formatted procedures).
Open Source Code	No	Due to institutional constraints, we are not able to release the code until the paper is fully accepted. Upon acceptance, we will release all code and data required to reproduce this work.
Open Datasets	Yes	We evaluate our performance on the Waymo Open Dataset [53] on the challenging task of reconstructing a scene from sparse images with low overlap.
Dataset Splits	Yes	Our dataset contains 20243 chunks for training and 5380 chunks for evaluation, out of the 798 training and 202 validation sequences.
Hardware Specification	Yes	We train both coarse-level and fine-level voxel latent diffusion models with 64 NVIDIA Tesla A100s for 2 days. For the appearance reconstruction model, we train it using 8 NVIDIA Tesla A100s for 2 days.
Software Dependencies	No	We train all of our models using the Adam [24] optimizer with β1 = 0.9 and β1 = 0.999. We use Py Torch Lightning [10] for building our distributed training framework.
Experiment Setup	Yes	Empirically, we use λ = 1.0 for LDepth in Eq (2). Additionally, we use λ1 = 0.9, λ2 = 1.0, λSSIM = 0.1 and λLPIPS = 0.6 in Eq (6). For image condition, we set the feature channel C = 32, the number of depth bins D = 64, znear = 0.1 and zfar = 90.0.