ReTR: Modeling Rendering Via Transformer for Generalizable Neural Surface Reconstruction
Authors: Yixun Liang, Hao He, Yingcong Chen
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the effectiveness of our approach on various datasets, showcasing how our method outperforms the current state-of-the-art approaches in terms of reconstruction quality and generalization ability. Our code is available at https://github.com/Yixun Liang/Re TR. |
| Researcher Affiliation | Academia | 1 The Hong Kong University of Science and Technology (Guangzhou). 2 The Hong Kong University of Science and Technology. |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available at https://github.com/Yixun Liang/Re TR. |
| Open Datasets | Yes | The DTU dataset [10] is a large-scale indoor multi-view stereo dataset... Furthermore, we evaluated our models generalization capabilities by testing them on three additional datasets: Tanks & Templates [12], ETH3D [13], and Blended MVS [11], where no additional training was performed on the testing datasets. |
| Dataset Splits | No | The paper mentions using 'coarse-to-fine' sampling for training and 'testing split' for evaluation, but does not provide specific details for a validation dataset split (e.g., percentages, sample counts, or a citation to a predefined validation split). |
| Hardware Specification | Yes | GPU: Nvidia RTX 3090 CPU: Intel Xeon Platinum 8180 @ 2.50 GHz |
| Software Dependencies | Yes | CUDA version: 11.1 cu DNN version: 8.0.5 Py Torch version: 1.10.1 |
| Experiment Setup | Yes | During the training stage, we resize the input image to 640 x 512 and the source views to N = 4. To train our model, we employ the Adam optimizer on a single Nvidia 3090 GPU. Initially, the learning rate is set to 10 4 and gradually decays to 10 6 using a cosine learning rate scheduler. Throughout the training, we use a batch size of 2 and set the number of rays to 1024. To enhance the sampling strategy, we apply a coarse-to-fine approach with both Ncoarse and Nfine set to 64. The Ncoarse points are uniformly sampled between the near and far plane, while the Nfine points are sampled using importance sampling based on the coarse probability estimation. Regarding the global feature volume f v, we set its resolution to K=128. |