Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
ST$^2$360D: Spatial-to-Temporal Consistency for Training-free 360 Monocular Depth Estimation
Authors: Zidong Cao, Jinjing Zhu, Hao Ai, Lutao Jiang, Yuanhuiyi Lyu, Hui Xiong
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that ST2360D achieves strong zero-shot capability on several datasets, supporting resolutions up to 4K. |
| Researcher Affiliation | Academia | 1Thrust of Artificial Intelligence, HKUST (Guangzhou), China 2University of Birmingham, UK 3Department of Computer Science and Engineering, HKUST, Hong Kong SAR, China |
| Pseudocode | No | No specific pseudocode or algorithm blocks are present in the main body of the paper. The methodology is described in narrative text and mathematical formulations. |
| Open Source Code | Yes | Project page: https://caozidong.github.io/ST2360D_Depth/. All the code will be publicly available. |
| Open Datasets | Yes | Datasets. We evaluate on five 360 depth datasets with varying resolutions. We use Matterport3D [15] and Stanford2D3D [16] at 512 1024 resolution (504 1008 in Tab. 1); Matterport3D-2K [15] and Replica360-2K [51] at 1024 2048; and Replica360-4K [51] at the highest resolution of 2048 4096. |
| Dataset Splits | Yes | Zero-shot comparison on Matterport3D and Stanford2D3D datasets with 504 1008 input resolution, following [25]. |
| Hardware Specification | Yes | All experiments are conducted on a single NVIDIA A40 GPU. |
| Software Dependencies | No | No specific software dependencies with version numbers are provided in the paper. It mentions using VDA [22] with ViT-Small as the backbone but not its specific software requirements. |
| Experiment Setup | Yes | By default, we employ VDA with Vi T-Small as the backbone, with 252 252 patch resolution. Impact of Fo V. As shown in Tab. 7, the Fo V significantly influences the overall performance. Setting the Fo V to 90 achieves optimal performance, whereas increasing it to 120 degrades the performance, likely due to redundant structural information from excessively large views. Impact of spatial resolution of patches. Tab. 8 shows that an intermediate patch resolution of 518 518, aligned with the training resolution of VDA [22], consistently yields the best performance. |