Large Spatial Model: End-to-end Unposed Images to Semantic 3D
Authors: Zhiwen Fan, Jian Zhang, Wenyan Cong, Peihao Wang, Renjie Li, Kairun Wen, Shijie Zhou, Achuta Kadambi, Zhangyang "Atlas" Wang, Danfei Xu, Boris Ivanovic, Marco Pavone, Yue Wang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comprehensive experiments on various tasks demonstrate that LSM unifies multiple 3D vision tasks directly from unposed images, achieving real-time semantic 3D reconstruction for the first time. |
| Researcher Affiliation | Collaboration | Zhiwen Fan1,2 , Jian Zhang3 , Wenyan Cong1, Peihao Wang1, Renjie Li4, Kairun Wen3, Shijie Zhou5, Achuta Kadambi5, Zhangyang Wang1, Danfei Xu2,6, Boris Ivanovic2, Marco Pavone2,7, Yue Wang2,8 1UT Austin 2NVIDIA Research 3XMU 4TAMU 5UCLA 6Ga Tech 7Stanford University 8USC |
| Pseudocode | No | The paper describes the model architecture and procedures in text and diagrams (Figure 2) but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | No | We will release our code after our paper gets accepted. |
| Open Datasets | Yes | leveraging a combined dataset of Scan Net++[60] and Scannet[61] |
| Dataset Splits | No | The paper describes training and testing splits ('we select one image out of four as test images, and the rest ones used as training'), but does not explicitly detail a separate validation set split. |
| Hardware Specification | Yes | Training is on 8 Nvidia A100 GPU lasts for 3 days. |
| Software Dependencies | No | The paper mentions several models and optimizers (e.g., Vi T-Large, DPT head, DUSt3R, Point Transformer V3, LSeg, Adam W) but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | The training of our model contains 100 epochs, leveraging a combined dataset of Scan Net++[60] and Scannet[61], of 1565 scenes. Training is on 8 Nvidia A100 GPU lasts for 3 days. We start with a base learning rate of 1e-4 and incorporate a 10-epoch warm-up period. Adam W is employed as the optimizer for all experiments. The parameters λ1, λ2, λ3 are set to 0.25, 0.3, and 1.5, respectively, as determined by the grid search. |