WT-MVSNet: Window-based Transformers for Multi-view Stereo

Authors: Jinli Liao, Yikang Ding, Yoli Shavit, Dihe Huang, Shihao Ren, Jia Guo, Wensen Feng, Kai Zhang

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our WT-MVSNet method (WTMVSNet) achieves state-of-the-art performance across multiple datasets and ranks 1st on Tanks and Temples benchmark. Extensive experiments show that our method achieves state-of-the-art performance on multiple datasets.
Researcher Affiliation Collaboration Jinli Liao 1,2 Yikang Ding 1 Yoli Shavit 3 Dihe Huang 1 Shihao Ren 1,2 Jia Guo 2 Wensen Feng 2 Kai Zhang 1,4 1 Tsinghua University 2 Huawei Technologies 3 Bar-Ilan University 4 Research Institute of Tsinghua, Pearl River Delta
Pseudocode No The paper does not include any explicit pseudocode or algorithm blocks.
Open Source Code Yes Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] In the supplemental material.
Open Datasets Yes We implement WT-MVSNet based on Pytorch, which is trained on DTU training set. Section 4.2 'Datasets' describes 'DTU is an indoor dataset...', 'Tanks and Temples is a large-scale benchmark...', 'Blended MVS is a large-scale synthetic dataset...'. All are well-established and cited benchmarks.
Dataset Splits Yes DTU dataset is split into 79 training scans, 18 validation scans, and 22 evaluation scans.
Hardware Specification Yes We train our model with the batch size being set to 1 on 8 Tesla V100 GPUs.
Software Dependencies No We implement WT-MVSNet based on Pytorch. The paper does not provide specific version numbers for PyTorch or other software dependencies.
Experiment Setup Yes We train our model using Adam for 16 epochs at a learning rate of 0.001, which decays by a factor of 0.5 after 6, 8, 12 epochs, respectively. We set combination coefficient γ = 100.0, the loss weights λ1 = 2.0 and λ2 = 1.0, the reprojection errors thresholds τ1 to 3.0, 2.0, 1.0 and τ2 to 0.1, 0.05, 0.01 at 3 resolutions. We train our model with the batch size being set to 1 on 8 Tesla V100 GPUs.