CostFormer:Cost Transformer for Cost Aggregation in Multi-view Stereo

Authors: Weitao Chen, Hongbin Xu, Zhipeng Zhou, Yang Liu, Baigui Sun, Wenxiong Kang, Xuansong Xie

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experimental results on DTU , Tanks&Temples, ETH3D, Blended MVS, and YFCC show that our method is competitive, efficient, and plug-and-play. For the evaluation on Tanks&Temples, we use the DTU dataset and the Blended MVS dataset. ... The quantitative results on the Tanks&Temples set are summarized in Table 1 and 2, which indicate the robustness of Cost Former.
Researcher Affiliation Collaboration Weitao Chen1 , Hongbin Xu1,2 , Zhipeng Zhou1 , Yang Liu1 , Baigui Sun 1 , Wenxiong Kang 2,3 and Xuansong Xie1 1Alibaba Group 2South China University of Technology 3Guangdong Artificial Intelligence and Digital Enconomy Laboratory, Pazhou Lab
Pseudocode No The paper describes methods in text and equations but does not include structured pseudocode or clearly labeled algorithm blocks.
Open Source Code No Appendix is presented in: https://arxiv.org/abs/2305.10320. The paper does not explicitly state that source code is released or provide a direct link to a code repository for the described methodology.
Open Datasets Yes The datasets used in the evaluation are DTU [Aanæs et al., 2016], Blended MVS [Yao et al., 2020], ETH3D [Sch ops et al., 2017], Tanks&Temples [Knapitsch et al., 2017], and YFCC-100M [Thomee et al., 2016].
Dataset Splits Yes For the evaluation on the DTU evaluation set, we only use the DTU training set. Blended MVS dataset is a large-scale synthetic dataset, consisting of 113 indoor and outdoor scenes and split into 106 training scenes and 7 validation scenes.
Hardware Specification Yes All models are trained on Nvidia GTX V100 GPUs. For a fair comparison, a fixed input size of 1152 864 is used to evaluate the computational cost on a single GPU of NVIDIA Telsa V100.
Software Dependencies No Cost Former is implemented by Pytorch [Paszke et al., 2019]. The paper mentions PyTorch but does not provide a specific version number for it or other software dependencies.
Experiment Setup Yes For RDACT, we set the depth number at stages 3, 2, 1 as 4, 2, 2; patch size at height, width and depth axes as 4, 4, 1; window size at height, width and depth axes as 7, 7, 2. If the backbone is set as Patch Match Net, embedding dimension number at stages 3, 2, 1 are set as 8, 8, 4. For RRT, we set the depth number as 2 at all stages, patch size as 1 at all axes; window size as 8 at all axes. If the backbone is set as Patch Match Net, embedding dimension number at iteration 2, 2, 1 at stages 3, 2, 1 as 32, 64, 16, 16, 8. During the training phase, we set the image resolution to 640 512.