reproducibilityindex.ai

WT-MVSNet: Window-based Transformers for Multi-view Stereo

Authors: Jinli Liao, Yikang Ding, Yoli Shavit, Dihe Huang, Shihao Ren, Jia Guo, Wensen Feng, Kai Zhang

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our WT-MVSNet method (WTMVSNet) achieves state-of-the-art performance across multiple datasets and ranks 1st on Tanks and Temples benchmark. Extensive experiments show that our method achieves state-of-the-art performance on multiple datasets.
Researcher Affiliation	Collaboration	Jinli Liao 1,2 Yikang Ding 1 Yoli Shavit 3 Dihe Huang 1 Shihao Ren 1,2 Jia Guo 2 Wensen Feng 2 Kai Zhang 1,4 1 Tsinghua University 2 Huawei Technologies 3 Bar-Ilan University 4 Research Institute of Tsinghua, Pearl River Delta
Pseudocode	No	The paper does not include any explicit pseudocode or algorithm blocks.
Open Source Code	Yes	Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] In the supplemental material.
Open Datasets	Yes	We implement WT-MVSNet based on Pytorch, which is trained on DTU training set. Section 4.2 'Datasets' describes 'DTU is an indoor dataset...', 'Tanks and Temples is a large-scale benchmark...', 'Blended MVS is a large-scale synthetic dataset...'. All are well-established and cited benchmarks.
Dataset Splits	Yes	DTU dataset is split into 79 training scans, 18 validation scans, and 22 evaluation scans.
Hardware Specification	Yes	We train our model with the batch size being set to 1 on 8 Tesla V100 GPUs.
Software Dependencies	No	We implement WT-MVSNet based on Pytorch. The paper does not provide specific version numbers for PyTorch or other software dependencies.
Experiment Setup	Yes	We train our model using Adam for 16 epochs at a learning rate of 0.001, which decays by a factor of 0.5 after 6, 8, 12 epochs, respectively. We set combination coefficient γ = 100.0, the loss weights λ1 = 2.0 and λ2 = 1.0, the reprojection errors thresholds τ1 to 3.0, 2.0, 1.0 and τ2 to 0.1, 0.05, 0.01 at 3 resolutions. We train our model with the batch size being set to 1 on 8 Tesla V100 GPUs.