Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

MVSMamba: Multi-View Stereo with State Space Model

Authors: Jianfei Jiang, Qiankun Liu, Hongyuan Liu, Haochen Yu, Liyong Wang, Jiansheng Chen, Huimin Ma

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experimental results demonstrate MVSMamba outperforms state-of-the-art MVS methods on the DTU dataset and the Tanks-and-Temples benchmark with both superior performance and efficiency.
Researcher Affiliation Academia University of Science and Technology Beijing, China EMAIL EMAIL
Pseudocode No The paper describes the methodology in prose and mathematical expressions (e.g., Section 3, Section 3.2) but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes The source code is available at https://github.com/Jianfei J/MVSMamba.
Open Datasets Yes We conduct experiments on three of the most widely used datasets in the field of MVS. (1) DTU [58] is an indoor dataset... (2) Tanks-and Temples [59] is a large-scale benchmark... (3) Blended MVS [62] is a large-scale synthetic dataset...
Dataset Splits Yes Following the MVSNet [9] protocol, we split the dataset into training, validation, and evaluation sets, resulting in a total of 27,097 training samples. ... For DTU training, we use 5-view input images at a resolution of 512 640, with a batch size of 4 for 15 epochs.
Hardware Specification Yes We use NVIDIA RTX A6000 GPUs for tranining and NVIDIA RTX 3090 for evalution.
Software Dependencies No MVSMamba is implemented using Py Torch [60] and optimized with the Adam optimizer [61].
Experiment Setup Yes For DTU training, we use 5-view input images at a resolution of 512 640, with a batch size of 4 for 15 epochs. The initial learning rate is set to 0.001 and is halved at the 10-th, 12-th, and 14-th epochs. For fine-tuning on Blended MVS, we use 11-view images at a resolution of 576 768 with a batch size of 2 for 15 epochs. The initial learning rate is 0.0005 and is reduced by half at the 6-th, 8-th, 10-th, and 12-th epochs. Additionally, consistent with [55, 30, 20], we conduct high-resolution training on DTU using 5-view images at 1024 1280 resolution for 10 epochs, with an initial learning rate of 0.001, halved at 6-th, 8-th, and 9-th epochs. The number of inverse depth hypotheses in four coarse-to-fine scales is set to 32-16-8-4, with corresponding depth intervals of 2-1-1-0.5, and the group correlation of 4-4-4-4.