Task-Aware Monocular Depth Estimation for 3D Object Detection
Authors: Xinlong Wang, Wei Yin, Tao Kong, Yuning Jiang, Lei Li, Chunhua Shen12257-12264
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We propose the foreground-background separated monocular depth estimation (Fore Se E) method, to estimate the foreground and background depth using separate optimization objectives and decoders. Our method significantly improves the depth estimation performance on foreground objects. Applying Fore Se E to 3D object detection, we achieve 7.5 AP gains and set new state-of-the-art results among other monocular methods. |
| Researcher Affiliation | Collaboration | 1The University of Adelaide, Australia, 2Bytedance AI Lab |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | Code will be available at: https://github.com/WXinlong/Fore Se E. |
| Open Datasets | Yes | KITTI dataset (Geiger et al. 2013) has witnessed inspiring progress in the field of depth estimation. As most of scenes in KITTI-Raw data have limited foreground objects, we construct a new benchmark which is based on KITTI-Object dataset. We collect the corresponding groundtruth depth map for each image in KITTI-Object training set, and term it as KITTI-Object-Depth (KOD) dataset. A total of 7, 481 image-depth pairs are divided into training and testing subsets with 3, 712 and 3, 769 samples respectively (Chen et al. 2015), which makes sure that images in the two subsets belong to different video clips. 2D bounding boxes are used to distinguish the foreground and background pixels. Pixels fall within the foreground bounding boxes are designated as foreground pixels, while the other pixels are assigned to be background. |
| Dataset Splits | No | A total of 7, 481 image-depth pairs are divided into training and testing subsets with 3, 712 and 3, 769 samples respectively (Chen et al. 2015) - The paper only specifies training and testing subsets, without mentioning a validation split or set. |
| Hardware Specification | No | The Stochastic Gradient Descent (SGD) solver is adopted to optimize the network on a single GPU. |
| Software Dependencies | No | The paper mentions 'Image Net pretrained Res Ne Xt-101' but does not provide specific software dependencies with version numbers. |
| Experiment Setup | Yes | For depth estimation, we follow the most of settings in baseline method (Wei et al. 2019). The Image Net pretrained Res Ne Xt-101 (Xie et al. 2017) is used as the backbone model. We train the network for 20 epochs, with batch size 4 and base learning rate set to 0.001. The Stochastic Gradient Descent (SGD) solver is adopted to optimize the network on a single GPU. λf and λb in foreground-background sensitive loss function are set to 0.2. |