Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
MVSFormer: Multi-View Stereo by Learning Robust Image Features and Temperature-based Depth
Authors: Chenjie Cao, Xinlin Ren, Yanwei Fu
TMLR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | MVSFormer achieves state-of-the-art performance on the DTU dataset. Particularly, MVSFormer ranks as Top-1 on both intermediate and advanced sets of the highly competitive Tanks-and-Temples leaderboard. Codes and models are released in https://github.com/ewrfcas/MVSFormer. Our methods are evaluated on DTU (Aanæs et al., 2016), Tanks-and-Temples (Knapitsch et al., 2017) and ETH3D (Schops et al., 2017). Our MVSFormer is evaluated on DTU with the official evaluation metrics of point clouds, i.e, accuracy, completeness, and the overall error. Quantitative results of DTU are shown in Tab. 1, and qualitative ones are shown in Fig. 9 and Fig. 10 of the Appendix. Our submission of the full trainable MVSFormer has ranked Top-1 on both intermediate and advanced sets of the official Tanks-and-Temples leaderboard compared with other published works since May/2022. Ablation studies: We have tested different pre-trained models for MVS in Tab. 4, which include Res Net50 (He et al., 2016), DINO (Caron et al., 2021), MAE (He et al., 2021), and Twins (Chu et al., 2021a). |
| Researcher Affiliation | Academia | Chenjie Cao EMAIL School of Data Science, Fudan University Xinlin Ren EMAIL School of Data Science, Fudan University Yanwei Fu EMAIL School of Data Science, Fudan University |
| Pseudocode | Yes | A.1 Multi-scale Training: The Py Torch pseudo-code of the multi-scale training is summarized in Alg. 1. |
| Open Source Code | Yes | Codes and models are released in https://github.com/ewrfcas/MVSFormer. |
| Open Datasets | Yes | Our methods are evaluated on DTU (Aanæs et al., 2016), Tanks-and-Temples (Knapitsch et al., 2017) and ETH3D (Schops et al., 2017). Since DTU data is collected in an indoor environment with fixed camera poses, our model is finetuned on the Blended MVS dataset (Yao et al., 2020) with various scenes and objects to generalize more complex environments in Tanks-and-Temples and ETH3D, as standard practice in Giang et al. (2021); Ding et al. (2021). |
| Dataset Splits | Yes | Our MVSFormer is evaluated on DTU with the official evaluation metrics of point clouds, i.e, accuracy, completeness, and the overall error. The testing resolution is fixed in 1152 1536 and the view number N = 5. Our submission of the full trainable MVSFormer has ranked Top-1 on both intermediate and advanced sets of the official Tanks-and-Temples leaderboard compared with other published works since May/2022. To show the robustness of the proposed method in scene data, we additionally evaluate MVSFormer on both training and test set of the high-resolution ETH3D (Schops et al., 2017) without re-training. |
| Hardware Specification | Yes | Thanks for the mixed-precision, it only takes about 22 and 15 hours for our proposed MVSFormer to be trained with 10 epochs in DTU (Aanæs et al., 2016) and Blended MVS (Yao et al., 2020) respectively with two V100 32GB NVIDIA Tesla GPUs. |
| Software Dependencies | No | The paper mentions "Py Torch pseudo-code" but does not specify a version for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | MVSFormer is trained by the view number N = 5 of 4 coarse-to-fine stages of 32-16-8-4 depth hypotheses. CNN parts in MVSFormer are trained by Adam with a learning rate of 1e-3. The part of Twins-small in MVSFormer is trained with learning rate 3e-5 and 0.01 weights decay, while DINO-small is frozen in MVSFormer-P. Our models are trained by 10 epochs on DTU and finetuned with another 10 epochs on Blended MVS. The learning rate is warmed up with 500 steps and then decayed with the cosine scheduler. For the multi-scale training, we dynamically change the sub-batch from 8 to 2 according to the scales from 512 to 1280, with a maximum batch size of 8. More details are in Appendix Sec. A. |