reproducibilityindex.ai

Metric from Human: Zero-shot Monocular Metric Depth Estimation via Test-time Adaptation

Authors: Yizhou Zhao, Hengwei Bian, Kaihua Chen, Pengliang Ji, Liao Qu, Shao-yu Lin, Weichen Yu, Haoran Li, Hao Chen, Jun Shen, Bhiksha Raj, Min Xu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through qualitative and quantitative experiments, we demonstrate the superiority and generalization ability of our Mf H in zero-shot MMDE, needless of any metric depth annotations.
Researcher Affiliation	Collaboration	1Carnegie Mellon University, Pittsburgh 2University of Wollongong, Wollongong
Pseudocode	No	The paper describes its pipeline with textual descriptions and figures, but no formal pseudocode or algorithm blocks are provided.
Open Source Code	Yes	https://github.com/Skaldak/Mf H
Open Datasets	Yes	Specifically, we evaluate the zero-shot MMDE capability of Mf H on NYU-Depth V2 [52], IBims-1 [53], ETH-3D [54] with the split from [13] and official masks, and KITTI [55] with the corrected Eigen-split from [51].
Dataset Splits	No	The paper mentions "test sets" and "test-time training" but does not specify the explicit breakdown of training, validation, and test splits with percentages or counts for the datasets used for evaluation. It refers to existing splits but doesn't detail them.
Hardware Specification	Yes	All experiments are run on one NVIDIA A100 GPU.
Software Dependencies	No	We adopt Depth Anything [2] without fintuning on metric annotations as our MRDE model, Stable Diffusion v2 [57] for generative painting, and HMR 2.0 [20] for human mesh recovery. In LSIlog, we follow Zoe Depth [12] to set the λ = 0.15. The paper mentions specific software but does not provide version numbers for all of them (e.g., Stable Diffusion v2, HMR 2.0 are named, but no PyTorch, Python, CUDA versions are given directly in the text).
Experiment Setup	Yes	In LSIlog, we follow Zoe Depth [12] to set the λ = 0.15. For optimizing the alignment parameters {sn}, {tn}, we leverage linear regression to obtain a close-formed solution. As for optimizing the metric head parameters, s, t, we use the L-BFGS optimizer with a fixed learning rate of 1 for 50 steps. Unless otherwise specified, we randomly paint 32 images for our comparison experiments and 4 for our ablation studies.