BEVDepth: Acquisition of Reliable Depth for Multi-View 3D Object Detection
Authors: Yinhao Li, Zheng Ge, Guanyi Yu, Jinrong Yang, Zengran Wang, Yukang Shi, Jianjian Sun, Zeming Li
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Aided by customized Efficient Voxel Pooling and multiframe mechanism, BEVDepth achieves the new state-of-the-art 60.9% NDS on the challenging nu Scenes test set while maintaining high efficiency. |
| Researcher Affiliation | Collaboration | Yinhao Li1, 2, Zheng Ge3, Guanyi Yu3, Jinrong Yang4 Zengran Wang3, Yukang Shi5, Jianjian Sun3, Zeming Li3 1Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, CAS 2University of Chinese Academy of Sciences 3MEGVII Technology 4Huazhong University of Science and Technology 5Xi an Jiaotong University liyinhao20@mails.ucas.edu.cn, {gezheng, yuguanyi, yangjinrong, wangzengran, shiyukang, sunjianjian, lizeming}@megvii.com |
| Pseudocode | No | The paper does not contain any sections or figures explicitly labeled as 'Pseudocode' or 'Algorithm', nor does it present structured steps in a code-like format. |
| Open Source Code | Yes | Codes have been released1. 1https://github.com/Megvii-BaseDetection/BEVDepth |
| Open Datasets | Yes | To validate the power of BEVDepth, we test it on nu Scenes (Caesar et al. 2020) dataset a well-known benchmark in the field of 3D detection. |
| Dataset Splits | Yes | Dataset and Metrics nu Scenes (Caesar et al. 2020) dataset is a large-scale autonomous driving benchmark containing data from six cameras, one Li DAR, and five radars. There are 1000 scenarios in the dataset, which are divided into 700, 150, and 150 scenes for training, validation, and testing, respectively. |
| Hardware Specification | No | The paper mentions optimization for 'great parallelism of GPU' and 'CUDA thread' for efficiency, but it does not provide specific hardware details such as GPU models, CPU types, or memory configurations used for experiments. |
| Software Dependencies | No | The paper refers to optimizers (Adam W) and models (Res Net-50, Center Point) by name and citation, but it does not specify version numbers for any software dependencies like programming languages, frameworks, or libraries (e.g., Python, PyTorch, CUDA). |
| Experiment Setup | Yes | Unless otherwise specified, we use Res Net-50 (He et al. 2016) as the image backbone and the image size is processed to 256 704. Following (Huang et al. 2021), we adopt image data augmentations including random cropping, random scaling, random flipping, and random rotation, and also adopt BEV data augmentations including random scaling, random flipping, and random rotation. We use Adam W (Loshchilov and Hutter 2017) as an optimizer with a learning rate set to 2e-4 and batch size set to 64. For the ablation study, all experiments are trained for 24 epochs without using CBGS strategy (Zhu et al. 2019). When compared to other methods, BEVDepth is trained for 20 epochs with CBGS. Camera-aware Depth Net is placed at the feature level with stride 16. |