CostFormer:Cost Transformer for Cost Aggregation in Multi-view Stereo
Authors: Weitao Chen, Hongbin Xu, Zhipeng Zhou, Yang Liu, Baigui Sun, Wenxiong Kang, Xuansong Xie
IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experimental results on DTU , Tanks&Temples, ETH3D, Blended MVS, and YFCC show that our method is competitive, efficient, and plug-and-play. For the evaluation on Tanks&Temples, we use the DTU dataset and the Blended MVS dataset. ... The quantitative results on the Tanks&Temples set are summarized in Table 1 and 2, which indicate the robustness of Cost Former. |
| Researcher Affiliation | Collaboration | Weitao Chen1 , Hongbin Xu1,2 , Zhipeng Zhou1 , Yang Liu1 , Baigui Sun 1 , Wenxiong Kang 2,3 and Xuansong Xie1 1Alibaba Group 2South China University of Technology 3Guangdong Artificial Intelligence and Digital Enconomy Laboratory, Pazhou Lab |
| Pseudocode | No | The paper describes methods in text and equations but does not include structured pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | Appendix is presented in: https://arxiv.org/abs/2305.10320. The paper does not explicitly state that source code is released or provide a direct link to a code repository for the described methodology. |
| Open Datasets | Yes | The datasets used in the evaluation are DTU [Aanæs et al., 2016], Blended MVS [Yao et al., 2020], ETH3D [Sch ops et al., 2017], Tanks&Temples [Knapitsch et al., 2017], and YFCC-100M [Thomee et al., 2016]. |
| Dataset Splits | Yes | For the evaluation on the DTU evaluation set, we only use the DTU training set. Blended MVS dataset is a large-scale synthetic dataset, consisting of 113 indoor and outdoor scenes and split into 106 training scenes and 7 validation scenes. |
| Hardware Specification | Yes | All models are trained on Nvidia GTX V100 GPUs. For a fair comparison, a fixed input size of 1152 864 is used to evaluate the computational cost on a single GPU of NVIDIA Telsa V100. |
| Software Dependencies | No | Cost Former is implemented by Pytorch [Paszke et al., 2019]. The paper mentions PyTorch but does not provide a specific version number for it or other software dependencies. |
| Experiment Setup | Yes | For RDACT, we set the depth number at stages 3, 2, 1 as 4, 2, 2; patch size at height, width and depth axes as 4, 4, 1; window size at height, width and depth axes as 7, 7, 2. If the backbone is set as Patch Match Net, embedding dimension number at stages 3, 2, 1 are set as 8, 8, 4. For RRT, we set the depth number as 2 at all stages, patch size as 1 at all axes; window size as 8 at all axes. If the backbone is set as Patch Match Net, embedding dimension number at iteration 2, 2, 1 at stages 3, 2, 1 as 32, 64, 16, 16, 8. During the training phase, we set the image resolution to 640 512. |