Metric from Human: Zero-shot Monocular Metric Depth Estimation via Test-time Adaptation

Authors: Yizhou Zhao, Hengwei Bian, Kaihua Chen, Pengliang Ji, Liao Qu, Shao-yu Lin, Weichen Yu, Haoran Li, Hao Chen, Jun Shen, Bhiksha Raj, Min Xu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through qualitative and quantitative experiments, we demonstrate the superiority and generalization ability of our Mf H in zero-shot MMDE, needless of any metric depth annotations.
Researcher Affiliation Collaboration 1Carnegie Mellon University, Pittsburgh 2University of Wollongong, Wollongong
Pseudocode No The paper describes its pipeline with textual descriptions and figures, but no formal pseudocode or algorithm blocks are provided.
Open Source Code Yes https://github.com/Skaldak/Mf H
Open Datasets Yes Specifically, we evaluate the zero-shot MMDE capability of Mf H on NYU-Depth V2 [52], IBims-1 [53], ETH-3D [54] with the split from [13] and official masks, and KITTI [55] with the corrected Eigen-split from [51].
Dataset Splits No The paper mentions "test sets" and "test-time training" but does not specify the explicit breakdown of training, validation, and test splits with percentages or counts for the datasets used for evaluation. It refers to existing splits but doesn't detail them.
Hardware Specification Yes All experiments are run on one NVIDIA A100 GPU.
Software Dependencies No We adopt Depth Anything [2] without fintuning on metric annotations as our MRDE model, Stable Diffusion v2 [57] for generative painting, and HMR 2.0 [20] for human mesh recovery. In LSIlog, we follow Zoe Depth [12] to set the λ = 0.15. The paper mentions specific software but does not provide version numbers for all of them (e.g., Stable Diffusion v2, HMR 2.0 are named, but no PyTorch, Python, CUDA versions are given directly in the text).
Experiment Setup Yes In LSIlog, we follow Zoe Depth [12] to set the λ = 0.15. For optimizing the alignment parameters {sn}, {tn}, we leverage linear regression to obtain a close-formed solution. As for optimizing the metric head parameters, s, t, we use the L-BFGS optimizer with a fixed learning rate of 1 for 50 steps. Unless otherwise specified, we randomly paint 32 images for our comparison experiments and 4 for our ablation studies.