BetterDepth: Plug-and-Play Diffusion Refiner for Zero-Shot Monocular Depth Estimation
Authors: Xiang Zhang, Bingxin Ke, Hayko Riemenschneider, Nando Metzger, Anton Obukhov, Markus Gross, Konrad Schindler, Christopher Schroers
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | With efficient training on small-scale synthetic datasets, Better Depth achieves state-of-the-art zero-shot MDE performance on diverse public datasets and on in-the-wild scenes.4 Experiments and Analysis |
| Researcher Affiliation | Collaboration | 1ETH Zürich, 2Disney Research|Studios |
| Pseudocode | Yes | Algorithm 1 Better Depth Training Procedure |
| Open Source Code | No | This is research done in collaboration with a corporate research lab and we haven t been able to get clearance to release the code. |
| Open Datasets | Yes | We follow Marigold [17] and use 74K samples from two synthetic datasets Hypersim [33] and Virtual KITTI [2] for training. Also, the NeurIPS checklist mentions: Hypersim: https://github.com/apple/ml-hypersim, Virtual KITTI: https://europe.naverlabs.com/research-old2/computer-vision/proxy-virtual-worlds-vkitti-2/ |
| Dataset Splits | No | We follow Marigold [17] and use 74K samples from two synthetic datasets Hypersim [33] and Virtual KITTI [2] for training. For evaluation, we employ five unseen datasets NYUv2 [28] (654 samples), KITTI [11] (652 samples from the Eigen test split [9]), ETH3D [43] (454 samples), Scan Net [6] (800 samples based on the Marigold split [17]), and DIODE [45] (325 indoor samples and 446 outdoor ones)... The paper defines training datasets and evaluation datasets, but not a distinct validation split. |
| Hardware Specification | Yes | The training takes around 1.5 days on a single NVIDIA RTX A6000 GPU.on an NVIDIA Ge Force RTX 4090 GPU. |
| Software Dependencies | No | The paper mentions using 'Adam optimizer [18]' but does not provide specific version numbers for software dependencies like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | Better Depth is trained for 5K iterations with batch size 32. The Adam optimizer [18] is used with the learning rate set to 3 10 5. We set the patch size w = 8 and the masking threshold η = 0.1 under the depth range [ 1, 1]. For inference, we apply the DDIM scheduler with 50-step sampling [44] and obtain the final result with 10 test-time ensemble members [17]. |