RSA: Resolving Scale Ambiguities in Monocular Depth Estimators through Language Descriptions

Authors: Ziyao Zeng, Yangchao Wu, Hyoungseob Park, Daniel Wang, Fengyu Yang, Stefano Soatto, DONG LAO, Byung-Woo Hong, Alex Wong

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate our method on recent general-purpose monocular depth models on indoors (NYUv2, VOID) and outdoors (KITTI). When trained on multiple datasets, RSA can serve as a general alignment module in zero-shot settings. Our method improves over common practices in aligning relative to metric depth and results in predictions that are comparable to an upper bound of fitting relative depth to ground truth via a linear transformation.
Researcher Affiliation Academia Ziyao Zeng1 Yangchao Wu2 Hyoungseob Park1 Daniel Wang1 Fengyu Yang1 Stefano Soatto2 Dong Lao2 Byung-Woo Hong3 Alex Wong1 1Yale University 2University of California, Los Angeles 3Chung-Ang University
Pseudocode No The paper describes the method and uses mathematical equations, but it does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes Code is available at: https://github.com/Adonis-galaxy/RSA
Open Datasets Yes We present our main result on three datasets: NYUv2 [46] and VOID [58] for indoor scenes, and KITTI [15] for outdoor scenes.
Dataset Splits Yes NYUv2 contains images with a resolution of 480 640 where depth values from 1 10 3 to 10 meters. We follow [29, 35, 79] for the dataset partition, which contains 24,231 train images and 654 test images.
Hardware Specification Yes We run our experiment on Ge Force RTX 3090 GPUs, with 24GB memory.
Software Dependencies No The paper mentions 'We use the Adam [25] optimizer,' but it does not specify version numbers for any software libraries, frameworks, or other dependencies.
Experiment Setup Yes We use the Adam [25] optimizer without weight decay. The learning rate is reduced from 3 10 5 to 1 10 5 by a cosine learning rate scheduler. The model is trained for 50 epochs under this scheduler.