Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

MVSFormer: Multi-View Stereo by Learning Robust Image Features and Temperature-based Depth

Authors: Chenjie Cao, Xinlin Ren, Yanwei Fu

TMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	MVSFormer achieves state-of-the-art performance on the DTU dataset. Particularly, MVSFormer ranks as Top-1 on both intermediate and advanced sets of the highly competitive Tanks-and-Temples leaderboard. Codes and models are released in https://github.com/ewrfcas/MVSFormer. Our methods are evaluated on DTU (Aanæs et al., 2016), Tanks-and-Temples (Knapitsch et al., 2017) and ETH3D (Schops et al., 2017). Our MVSFormer is evaluated on DTU with the official evaluation metrics of point clouds, i.e, accuracy, completeness, and the overall error. Quantitative results of DTU are shown in Tab. 1, and qualitative ones are shown in Fig. 9 and Fig. 10 of the Appendix. Our submission of the full trainable MVSFormer has ranked Top-1 on both intermediate and advanced sets of the official Tanks-and-Temples leaderboard compared with other published works since May/2022. Ablation studies: We have tested different pre-trained models for MVS in Tab. 4, which include Res Net50 (He et al., 2016), DINO (Caron et al., 2021), MAE (He et al., 2021), and Twins (Chu et al., 2021a).
Researcher Affiliation	Academia	Chenjie Cao EMAIL School of Data Science, Fudan University Xinlin Ren EMAIL School of Data Science, Fudan University Yanwei Fu EMAIL School of Data Science, Fudan University
Pseudocode	Yes	A.1 Multi-scale Training: The Py Torch pseudo-code of the multi-scale training is summarized in Alg. 1.
Open Source Code	Yes	Codes and models are released in https://github.com/ewrfcas/MVSFormer.
Open Datasets	Yes	Our methods are evaluated on DTU (Aanæs et al., 2016), Tanks-and-Temples (Knapitsch et al., 2017) and ETH3D (Schops et al., 2017). Since DTU data is collected in an indoor environment with fixed camera poses, our model is finetuned on the Blended MVS dataset (Yao et al., 2020) with various scenes and objects to generalize more complex environments in Tanks-and-Temples and ETH3D, as standard practice in Giang et al. (2021); Ding et al. (2021).
Dataset Splits	Yes	Our MVSFormer is evaluated on DTU with the official evaluation metrics of point clouds, i.e, accuracy, completeness, and the overall error. The testing resolution is fixed in 1152 1536 and the view number N = 5. Our submission of the full trainable MVSFormer has ranked Top-1 on both intermediate and advanced sets of the official Tanks-and-Temples leaderboard compared with other published works since May/2022. To show the robustness of the proposed method in scene data, we additionally evaluate MVSFormer on both training and test set of the high-resolution ETH3D (Schops et al., 2017) without re-training.
Hardware Specification	Yes	Thanks for the mixed-precision, it only takes about 22 and 15 hours for our proposed MVSFormer to be trained with 10 epochs in DTU (Aanæs et al., 2016) and Blended MVS (Yao et al., 2020) respectively with two V100 32GB NVIDIA Tesla GPUs.
Software Dependencies	No	The paper mentions "Py Torch pseudo-code" but does not specify a version for PyTorch or any other software dependencies.
Experiment Setup	Yes	MVSFormer is trained by the view number N = 5 of 4 coarse-to-fine stages of 32-16-8-4 depth hypotheses. CNN parts in MVSFormer are trained by Adam with a learning rate of 1e-3. The part of Twins-small in MVSFormer is trained with learning rate 3e-5 and 0.01 weights decay, while DINO-small is frozen in MVSFormer-P. Our models are trained by 10 epochs on DTU and finetuned with another 10 epochs on Blended MVS. The learning rate is warmed up with 500 steps and then decayed with the cosine scheduler. For the multi-scale training, we dynamically change the sub-batch from 8 to 2 according to the scales from 512 to 1280, with a maximum batch size of 8. More details are in Appendix Sec. A.