Metric from Human: Zero-shot Monocular Metric Depth Estimation via Test-time Adaptation
Authors: Yizhou Zhao, Hengwei Bian, Kaihua Chen, Pengliang Ji, Liao Qu, Shao-yu Lin, Weichen Yu, Haoran Li, Hao Chen, Jun Shen, Bhiksha Raj, Min Xu
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through qualitative and quantitative experiments, we demonstrate the superiority and generalization ability of our Mf H in zero-shot MMDE, needless of any metric depth annotations. |
| Researcher Affiliation | Collaboration | 1Carnegie Mellon University, Pittsburgh 2University of Wollongong, Wollongong |
| Pseudocode | No | The paper describes its pipeline with textual descriptions and figures, but no formal pseudocode or algorithm blocks are provided. |
| Open Source Code | Yes | https://github.com/Skaldak/Mf H |
| Open Datasets | Yes | Specifically, we evaluate the zero-shot MMDE capability of Mf H on NYU-Depth V2 [52], IBims-1 [53], ETH-3D [54] with the split from [13] and official masks, and KITTI [55] with the corrected Eigen-split from [51]. |
| Dataset Splits | No | The paper mentions "test sets" and "test-time training" but does not specify the explicit breakdown of training, validation, and test splits with percentages or counts for the datasets used for evaluation. It refers to existing splits but doesn't detail them. |
| Hardware Specification | Yes | All experiments are run on one NVIDIA A100 GPU. |
| Software Dependencies | No | We adopt Depth Anything [2] without fintuning on metric annotations as our MRDE model, Stable Diffusion v2 [57] for generative painting, and HMR 2.0 [20] for human mesh recovery. In LSIlog, we follow Zoe Depth [12] to set the λ = 0.15. The paper mentions specific software but does not provide version numbers for all of them (e.g., Stable Diffusion v2, HMR 2.0 are named, but no PyTorch, Python, CUDA versions are given directly in the text). |
| Experiment Setup | Yes | In LSIlog, we follow Zoe Depth [12] to set the λ = 0.15. For optimizing the alignment parameters {sn}, {tn}, we leverage linear regression to obtain a close-formed solution. As for optimizing the metric head parameters, s, t, we use the L-BFGS optimizer with a fixed learning rate of 1 for 50 steps. Unless otherwise specified, we randomly paint 32 images for our comparison experiments and 4 for our ablation studies. |