Learning Monocular Depth in Dynamic Environment via Context-aware Temporal Attention
Authors: Zizhang Wu, Zhuozheng Li, Zhi-Gang Fan, Yunzhe Wu, Yuanzhu Gan, Jian Pu
IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 Experiments and Results, We conduct extensive experiments on three challenging benchmarks to validate the effectiveness of our pipeline against state-of-the-art models. |
| Researcher Affiliation | Collaboration | Zizhang Wu1 , Zhuozheng Li1 , Zhi-Gang Fan1 , Yunzhe Wu1 , Yuanzhu Gan1 and Jian Pu2 1Zongmu Tech 2Fudan University wuzizhang87@gmail.com, {zhuozheng.li, zhigang.fan, nelson.wu, yuanzhu.gan}@zongmutech.com jianpu@fudan.edu.cn |
| Pseudocode | No | The paper describes the proposed modules and their processes, but no explicit pseudocode or algorithm blocks are provided. |
| Open Source Code | No | The paper does not contain an explicit statement about releasing the source code or a link to a code repository for the described methodology. |
| Open Datasets | Yes | KITTI. KITTI dataset [Geiger et al., 2012] is a popular benchmark for the task of autonomous driving, which provides over 93,000 depth maps with corresponding raw Li DAR scans and RGB images aligned with raw data. ...Virtual KITTI 2. VKITTI2 dataset [Gaidon et al., 2016] is widely used for video understanding tasks... Nu Scenes. Nu Scenes dataset [Caesar et al., 2020] is a large-scale multi-modal autonomous driving dataset... |
| Dataset Splits | Yes | In experiments, we follow the widely-used KITTI Eigen split [Eigen et al., 2014] for network training, which is composed of 22,600 images from 32 scenes for training and 697 images from 29 scenes for testing. |
| Hardware Specification | Yes | Given the same Nvidia RTX A6000 GPU on the KITTI dataset |
| Software Dependencies | No | The paper states 'We implement our CTA-Depth in Py Torch' but does not provide specific version numbers for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | We implement our CTA-Depth in Py Torch and train it for 100 epochs with a mini-batch size of 4. The learning rate is 2 10 4 for both depth and pose refinement, which is decayed by a constant step (gamma=0.5 and step size=30). We set β1 = 0.9 and β2 = 0.999 in the Adam optimizer. We resize the input images to 320 960 for training, and set the number of sequential images to 2 for CTA-Refiner by balancing both computation efficiency and prediction accuracy. For long-range geometry embedding, the number of temporally adjacent images is set to N = 3. We fix m at 3 and n at 4 in experiments. |