RSA: Resolving Scale Ambiguities in Monocular Depth Estimators through Language Descriptions
Authors: Ziyao Zeng, Yangchao Wu, Hyoungseob Park, Daniel Wang, Fengyu Yang, Stefano Soatto, DONG LAO, Byung-Woo Hong, Alex Wong
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate our method on recent general-purpose monocular depth models on indoors (NYUv2, VOID) and outdoors (KITTI). When trained on multiple datasets, RSA can serve as a general alignment module in zero-shot settings. Our method improves over common practices in aligning relative to metric depth and results in predictions that are comparable to an upper bound of fitting relative depth to ground truth via a linear transformation. |
| Researcher Affiliation | Academia | Ziyao Zeng1 Yangchao Wu2 Hyoungseob Park1 Daniel Wang1 Fengyu Yang1 Stefano Soatto2 Dong Lao2 Byung-Woo Hong3 Alex Wong1 1Yale University 2University of California, Los Angeles 3Chung-Ang University |
| Pseudocode | No | The paper describes the method and uses mathematical equations, but it does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | Code is available at: https://github.com/Adonis-galaxy/RSA |
| Open Datasets | Yes | We present our main result on three datasets: NYUv2 [46] and VOID [58] for indoor scenes, and KITTI [15] for outdoor scenes. |
| Dataset Splits | Yes | NYUv2 contains images with a resolution of 480 640 where depth values from 1 10 3 to 10 meters. We follow [29, 35, 79] for the dataset partition, which contains 24,231 train images and 654 test images. |
| Hardware Specification | Yes | We run our experiment on Ge Force RTX 3090 GPUs, with 24GB memory. |
| Software Dependencies | No | The paper mentions 'We use the Adam [25] optimizer,' but it does not specify version numbers for any software libraries, frameworks, or other dependencies. |
| Experiment Setup | Yes | We use the Adam [25] optimizer without weight decay. The learning rate is reduced from 3 10 5 to 1 10 5 by a cosine learning rate scheduler. The model is trained for 50 epochs under this scheduler. |