Crafting Monocular Cues and Velocity Guidance for Self-Supervised Multi-Frame Depth Learning

Authors: Xiaofeng Wang, Zheng Zhu, Guan Huang, Xu Chi, Yun Ye, Ziwei Chen, Xingang Wang

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show MOVEDepth achieves state-of-the-art performance: Compared with Monodepth2 and Pack Net, our method relatively improves the depth accuracy by 20% and 19.8% on the KITTI benchmark. MOVEDepth also generalizes to the more challenging DDAD benchmark, relatively outperforming Many Depth by 7.2%.
Researcher Affiliation Collaboration Xiaofeng Wang1,2, Zheng Zhu3, Guan Huang3, Xu Chi3, Yun Ye3, Ziwei Chen4, Xingang Wang1* 1 Institute of Automation, Chinese Academy of Sciences 2 School of Artificial Intelligence, University of Chinese Academy of Science 3 Phi Gent Robotics 4 Southeast University
Pseudocode No The paper describes methods in text and with diagrams but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes The code is available at https://github.com/ Jeff Wang987/MOVEDepth.
Open Datasets Yes MOVEDepth is evaluated on KITTI (Geiger, Lenz, and Urtasun 2012) and DDAD (Guizilini et al. 2019) to verify the effectiveness.
Dataset Splits Yes Following from the Eigen split (Eigen and Fergus 2014), with data preprocessing from (Zhou et al. 2017), the data is divided into 39810/4424/697 training, validation and test images.
Hardware Specification Yes MOVEDepth is trained on 4 NVIDIA RTX 3090 GPUs with batch size 6 on each GPU.
Software Dependencies No The paper does not specify software dependencies with version numbers, such as specific deep learning frameworks (e.g., PyTorch, TensorFlow) or their versions, or Python version.
Experiment Setup Yes MOVEDepth is trained with an input resolution of 640 192 (KITTI) and 640 384 (DDAD). We only use two frames {It 1, It} for cost volume construction, and use {It 1, It, It+1} for reprojection loss. We train MOVEDepth for 20 epochs and optimize it with Adam (Kingma and Ba 2015). The learning rate is initially set as 0.0002, which decays by a factor of 10 for the final 5 epochs. Following (Godard et al. 2018; Watson et al. 2021), the loss weight γ is set as 0.001, and λi(i {1,2,3}) = 1. For MVS cost volume construction, the number of depth candidates is 16, group correlation G = 16, and β = 0.15.