reproducibilityindex.ai

Task-Aware Monocular Depth Estimation for 3D Object Detection

Authors: Xinlong Wang, Wei Yin, Tao Kong, Yuning Jiang, Lei Li, Chunhua Shen12257-12264

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We propose the foreground-background separated monocular depth estimation (Fore Se E) method, to estimate the foreground and background depth using separate optimization objectives and decoders. Our method signiﬁcantly improves the depth estimation performance on foreground objects. Applying Fore Se E to 3D object detection, we achieve 7.5 AP gains and set new state-of-the-art results among other monocular methods.
Researcher Affiliation	Collaboration	1The University of Adelaide, Australia, 2Bytedance AI Lab
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	Code will be available at: https://github.com/WXinlong/Fore Se E.
Open Datasets	Yes	KITTI dataset (Geiger et al. 2013) has witnessed inspiring progress in the ﬁeld of depth estimation. As most of scenes in KITTI-Raw data have limited foreground objects, we construct a new benchmark which is based on KITTI-Object dataset. We collect the corresponding groundtruth depth map for each image in KITTI-Object training set, and term it as KITTI-Object-Depth (KOD) dataset. A total of 7, 481 image-depth pairs are divided into training and testing subsets with 3, 712 and 3, 769 samples respectively (Chen et al. 2015), which makes sure that images in the two subsets belong to different video clips. 2D bounding boxes are used to distinguish the foreground and background pixels. Pixels fall within the foreground bounding boxes are designated as foreground pixels, while the other pixels are assigned to be background.
Dataset Splits	No	A total of 7, 481 image-depth pairs are divided into training and testing subsets with 3, 712 and 3, 769 samples respectively (Chen et al. 2015) - The paper only specifies training and testing subsets, without mentioning a validation split or set.
Hardware Specification	No	The Stochastic Gradient Descent (SGD) solver is adopted to optimize the network on a single GPU.
Software Dependencies	No	The paper mentions 'Image Net pretrained Res Ne Xt-101' but does not provide specific software dependencies with version numbers.
Experiment Setup	Yes	For depth estimation, we follow the most of settings in baseline method (Wei et al. 2019). The Image Net pretrained Res Ne Xt-101 (Xie et al. 2017) is used as the backbone model. We train the network for 20 epochs, with batch size 4 and base learning rate set to 0.001. The Stochastic Gradient Descent (SGD) solver is adopted to optimize the network on a single GPU. λf and λb in foreground-background sensitive loss function are set to 0.2.