Monocular Scene Reconstruction with 3D SDF Transformers
Authors: Weihao Yuan, Xiaodong Gu, Heng Li, Zilong Dong, Siyu Zhu
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experiments on multiple datasets show that this 3D transformer network generates a more accurate and complete reconstruction, which outperforms previous methods by a large margin. Remarkably, the mesh accuracy is improved by 41.8%, and the mesh completeness is improved by 25.3% on the Scan Net dataset. |
| Researcher Affiliation | Industry | Weihao Yuan, Xiaodong Gu, Heng Li, Zilong Dong, Siyu Zhu Alibaba Group {qianmu.ywh, dadong.gxd, baoshu.lh, list.dzl, siting.zsy} @alibaba-inc.com |
| Pseudocode | No | The paper describes the architecture and components in text and figures (e.g., Figure 1, Figure 3), but it does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | 1Project Page: https://weihaosky.github.io/former3d |
| Open Datasets | Yes | Scan Net (Dai et al., 2017) is a large-scale indoor dataset composed of 1613 RGB-D videos of 806 indoor scenes. ... TUM-RGBD (Sturm et al., 2012) and ICL-NUIM (Handa et al., 2014) are also two datasets composed of RGB-D videos but with small-number scenes. |
| Dataset Splits | No | The paper states 'We follow the official train/test split, where there are 1513 scans used for training and 100 scans used for testing.' It explicitly mentions train and test splits, but no separate validation split details are provided. |
| Hardware Specification | Yes | Our work is implemented in Pytorch and trained on Nvidia V100 GPUs. ... The runtime analysis is presented in Table 5. For a fair comparison to previous methods, the time is tested on a chunk of size 1.5 1.5 1.5 m3 with an Nvidia RTX 3090 GPU. |
| Software Dependencies | No | The paper states 'Our work is implemented in Pytorch' but does not provide a specific version number for Pytorch or any other software dependencies with version information. |
| Experiment Setup | Yes | The network is optimized with the Adam optimizer (β1 = 0.9, β2 = 0.999) with learning rate of 1 10 4. For a fair comparison with previous methods, the voxel size of the fine level is set to 4cm, and the TSDF truncation distance is set to triple the voxel size. Thus the voxel size of the medium and the coarse levels are 8 cm and 16 cm, respectively. For the balance of efficiency and receptive field, the window size of the sparse window attention is set to 10. ... The view limit is set to 20 in the training, which means twenty images are input to the network for one iteration, while the limit for testing is set to 150. |