Crafting Monocular Cues and Velocity Guidance for Self-Supervised Multi-Frame Depth Learning
Authors: Xiaofeng Wang, Zheng Zhu, Guan Huang, Xu Chi, Yun Ye, Ziwei Chen, Xingang Wang
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show MOVEDepth achieves state-of-the-art performance: Compared with Monodepth2 and Pack Net, our method relatively improves the depth accuracy by 20% and 19.8% on the KITTI benchmark. MOVEDepth also generalizes to the more challenging DDAD benchmark, relatively outperforming Many Depth by 7.2%. |
| Researcher Affiliation | Collaboration | Xiaofeng Wang1,2, Zheng Zhu3, Guan Huang3, Xu Chi3, Yun Ye3, Ziwei Chen4, Xingang Wang1* 1 Institute of Automation, Chinese Academy of Sciences 2 School of Artificial Intelligence, University of Chinese Academy of Science 3 Phi Gent Robotics 4 Southeast University |
| Pseudocode | No | The paper describes methods in text and with diagrams but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | The code is available at https://github.com/ Jeff Wang987/MOVEDepth. |
| Open Datasets | Yes | MOVEDepth is evaluated on KITTI (Geiger, Lenz, and Urtasun 2012) and DDAD (Guizilini et al. 2019) to verify the effectiveness. |
| Dataset Splits | Yes | Following from the Eigen split (Eigen and Fergus 2014), with data preprocessing from (Zhou et al. 2017), the data is divided into 39810/4424/697 training, validation and test images. |
| Hardware Specification | Yes | MOVEDepth is trained on 4 NVIDIA RTX 3090 GPUs with batch size 6 on each GPU. |
| Software Dependencies | No | The paper does not specify software dependencies with version numbers, such as specific deep learning frameworks (e.g., PyTorch, TensorFlow) or their versions, or Python version. |
| Experiment Setup | Yes | MOVEDepth is trained with an input resolution of 640 192 (KITTI) and 640 384 (DDAD). We only use two frames {It 1, It} for cost volume construction, and use {It 1, It, It+1} for reprojection loss. We train MOVEDepth for 20 epochs and optimize it with Adam (Kingma and Ba 2015). The learning rate is initially set as 0.0002, which decays by a factor of 10 for the final 5 epochs. Following (Godard et al. 2018; Watson et al. 2021), the loss weight γ is set as 0.001, and λi(i {1,2,3}) = 1. For MVS cost volume construction, the number of depth candidates is 16, group correlation G = 16, and β = 0.15. |