WT-MVSNet: Window-based Transformers for Multi-view Stereo
Authors: Jinli Liao, Yikang Ding, Yoli Shavit, Dihe Huang, Shihao Ren, Jia Guo, Wensen Feng, Kai Zhang
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our WT-MVSNet method (WTMVSNet) achieves state-of-the-art performance across multiple datasets and ranks 1st on Tanks and Temples benchmark. Extensive experiments show that our method achieves state-of-the-art performance on multiple datasets. |
| Researcher Affiliation | Collaboration | Jinli Liao 1,2 Yikang Ding 1 Yoli Shavit 3 Dihe Huang 1 Shihao Ren 1,2 Jia Guo 2 Wensen Feng 2 Kai Zhang 1,4 1 Tsinghua University 2 Huawei Technologies 3 Bar-Ilan University 4 Research Institute of Tsinghua, Pearl River Delta |
| Pseudocode | No | The paper does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] In the supplemental material. |
| Open Datasets | Yes | We implement WT-MVSNet based on Pytorch, which is trained on DTU training set. Section 4.2 'Datasets' describes 'DTU is an indoor dataset...', 'Tanks and Temples is a large-scale benchmark...', 'Blended MVS is a large-scale synthetic dataset...'. All are well-established and cited benchmarks. |
| Dataset Splits | Yes | DTU dataset is split into 79 training scans, 18 validation scans, and 22 evaluation scans. |
| Hardware Specification | Yes | We train our model with the batch size being set to 1 on 8 Tesla V100 GPUs. |
| Software Dependencies | No | We implement WT-MVSNet based on Pytorch. The paper does not provide specific version numbers for PyTorch or other software dependencies. |
| Experiment Setup | Yes | We train our model using Adam for 16 epochs at a learning rate of 0.001, which decays by a factor of 0.5 after 6, 8, 12 epochs, respectively. We set combination coefficient γ = 100.0, the loss weights λ1 = 2.0 and λ2 = 1.0, the reprojection errors thresholds τ1 to 3.0, 2.0, 1.0 and τ2 to 0.1, 0.05, 0.01 at 3 resolutions. We train our model with the batch size being set to 1 on 8 Tesla V100 GPUs. |