Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
MVSMamba: Multi-View Stereo with State Space Model
Authors: Jianfei Jiang, Qiankun Liu, Hongyuan Liu, Haochen Yu, Liyong Wang, Jiansheng Chen, Huimin Ma
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experimental results demonstrate MVSMamba outperforms state-of-the-art MVS methods on the DTU dataset and the Tanks-and-Temples benchmark with both superior performance and efficiency. |
| Researcher Affiliation | Academia | University of Science and Technology Beijing, China EMAIL EMAIL |
| Pseudocode | No | The paper describes the methodology in prose and mathematical expressions (e.g., Section 3, Section 3.2) but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The source code is available at https://github.com/Jianfei J/MVSMamba. |
| Open Datasets | Yes | We conduct experiments on three of the most widely used datasets in the field of MVS. (1) DTU [58] is an indoor dataset... (2) Tanks-and Temples [59] is a large-scale benchmark... (3) Blended MVS [62] is a large-scale synthetic dataset... |
| Dataset Splits | Yes | Following the MVSNet [9] protocol, we split the dataset into training, validation, and evaluation sets, resulting in a total of 27,097 training samples. ... For DTU training, we use 5-view input images at a resolution of 512 640, with a batch size of 4 for 15 epochs. |
| Hardware Specification | Yes | We use NVIDIA RTX A6000 GPUs for tranining and NVIDIA RTX 3090 for evalution. |
| Software Dependencies | No | MVSMamba is implemented using Py Torch [60] and optimized with the Adam optimizer [61]. |
| Experiment Setup | Yes | For DTU training, we use 5-view input images at a resolution of 512 640, with a batch size of 4 for 15 epochs. The initial learning rate is set to 0.001 and is halved at the 10-th, 12-th, and 14-th epochs. For fine-tuning on Blended MVS, we use 11-view images at a resolution of 576 768 with a batch size of 2 for 15 epochs. The initial learning rate is 0.0005 and is reduced by half at the 6-th, 8-th, 10-th, and 12-th epochs. Additionally, consistent with [55, 30, 20], we conduct high-resolution training on DTU using 5-view images at 1024 1280 resolution for 10 epochs, with an initial learning rate of 0.001, halved at 6-th, 8-th, and 9-th epochs. The number of inverse depth hypotheses in four coarse-to-fine scales is set to 32-16-8-4, with corresponding depth intervals of 2-1-1-0.5, and the group correlation of 4-4-4-4. |