A Global Depth-Range-Free Multi-View Stereo Transformer Network with Pose Embedding

Authors: Yitong Dong, Yijin Li, Zhaoyang Huang, Weikang Bian, Jingbo Liu, Hujun Bao, Zhaopeng Cui, Hongsheng Li, Guofeng Zhang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive results on the DTU dataset and Tanks&Temple benchmark demonstrate the effectiveness of our method.
Researcher Affiliation Collaboration 1State Key Lab of CAD&CG, Zhejiang University 2CUHK MMLab
Pseudocode No The paper describes the method using text and mathematical equations, but does not include any pseudocode or algorithm blocks.
Open Source Code No We plan to release the code and detailed results later.
Open Datasets Yes DTU dataset [23] is an indoor multi-view stereo dataset... Blended MVS dataset [66] is a large-scale outdoor multi-view stereo dataset... Tanks and Temples [24] is a public multi-view stereo benchmark
Dataset Splits Yes Following MVSNet [8], we partitioned the DTU dataset into 79 training sets, 18 validation sets, and 22 evaluation sets.
Hardware Specification Yes The training procedure is finished on two A100
Software Dependencies No The paper mentions 'Implemented by PyTorch [67]' but does not provide specific version numbers for PyTorch or any other software dependencies.
Experiment Setup Yes On the DTU dataset, we set the image resolution as 640 512 and the number of input images as 5 for the training phase... For all models, we use the Adam W optimizer with an initial learning rate of 0.0002 that halves every four epochs for 16 epochs.