Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

ST$^2$360D: Spatial-to-Temporal Consistency for Training-free 360 Monocular Depth Estimation

Authors: Zidong Cao, Jinjing Zhu, Hao Ai, Lutao Jiang, Yuanhuiyi Lyu, Hui Xiong

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that ST2360D achieves strong zero-shot capability on several datasets, supporting resolutions up to 4K.
Researcher Affiliation Academia 1Thrust of Artificial Intelligence, HKUST (Guangzhou), China 2University of Birmingham, UK 3Department of Computer Science and Engineering, HKUST, Hong Kong SAR, China
Pseudocode No No specific pseudocode or algorithm blocks are present in the main body of the paper. The methodology is described in narrative text and mathematical formulations.
Open Source Code Yes Project page: https://caozidong.github.io/ST2360D_Depth/. All the code will be publicly available.
Open Datasets Yes Datasets. We evaluate on five 360 depth datasets with varying resolutions. We use Matterport3D [15] and Stanford2D3D [16] at 512 1024 resolution (504 1008 in Tab. 1); Matterport3D-2K [15] and Replica360-2K [51] at 1024 2048; and Replica360-4K [51] at the highest resolution of 2048 4096.
Dataset Splits Yes Zero-shot comparison on Matterport3D and Stanford2D3D datasets with 504 1008 input resolution, following [25].
Hardware Specification Yes All experiments are conducted on a single NVIDIA A40 GPU.
Software Dependencies No No specific software dependencies with version numbers are provided in the paper. It mentions using VDA [22] with ViT-Small as the backbone but not its specific software requirements.
Experiment Setup Yes By default, we employ VDA with Vi T-Small as the backbone, with 252 252 patch resolution. Impact of Fo V. As shown in Tab. 7, the Fo V significantly influences the overall performance. Setting the Fo V to 90 achieves optimal performance, whereas increasing it to 120 degrades the performance, likely due to redundant structural information from excessively large views. Impact of spatial resolution of patches. Tab. 8 shows that an intermediate patch resolution of 518 518, aligned with the training resolution of VDA [22], consistently yields the best performance.