Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

ST$^2$360D: Spatial-to-Temporal Consistency for Training-free 360 Monocular Depth Estimation

Authors: Zidong Cao, Jinjing Zhu, Hao Ai, Lutao Jiang, Yuanhuiyi Lyu, Hui Xiong

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that ST2360D achieves strong zero-shot capability on several datasets, supporting resolutions up to 4K.
Researcher Affiliation	Academia	1Thrust of Artificial Intelligence, HKUST (Guangzhou), China 2University of Birmingham, UK 3Department of Computer Science and Engineering, HKUST, Hong Kong SAR, China
Pseudocode	No	No specific pseudocode or algorithm blocks are present in the main body of the paper. The methodology is described in narrative text and mathematical formulations.
Open Source Code	Yes	Project page: https://caozidong.github.io/ST2360D_Depth/. All the code will be publicly available.
Open Datasets	Yes	Datasets. We evaluate on five 360 depth datasets with varying resolutions. We use Matterport3D [15] and Stanford2D3D [16] at 512 1024 resolution (504 1008 in Tab. 1); Matterport3D-2K [15] and Replica360-2K [51] at 1024 2048; and Replica360-4K [51] at the highest resolution of 2048 4096.
Dataset Splits	Yes	Zero-shot comparison on Matterport3D and Stanford2D3D datasets with 504 1008 input resolution, following [25].
Hardware Specification	Yes	All experiments are conducted on a single NVIDIA A40 GPU.
Software Dependencies	No	No specific software dependencies with version numbers are provided in the paper. It mentions using VDA [22] with ViT-Small as the backbone but not its specific software requirements.
Experiment Setup	Yes	By default, we employ VDA with Vi T-Small as the backbone, with 252 252 patch resolution. Impact of Fo V. As shown in Tab. 7, the Fo V significantly influences the overall performance. Setting the Fo V to 90 achieves optimal performance, whereas increasing it to 120 degrades the performance, likely due to redundant structural information from excessively large views. Impact of spatial resolution of patches. Tab. 8 shows that an intermediate patch resolution of 518 518, aligned with the training resolution of VDA [22], consistently yields the best performance.