BEVFusion: A Simple and Robust LiDAR-Camera Fusion Framework
Authors: Tingting Liang, Hongwei Xie, Kaicheng Yu, Zhongyu Xia, Zhiwei Lin, Yongtao Wang, Tao Tang, Bing Wang, Zhi Tang
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically show that our framework surpasses the state-of-the-art methods under the normal training settings. Under the robustness training settings that simulate various Li DAR malfunctions, our framework significantly surpasses the state-of-the-art methods by 15.7% to 28.9% m AP. |
| Researcher Affiliation | Collaboration | 1 Wangxuan Institute of Computer Technology, Peking University, China 2 DAMO Academy, Alibaba Group, China 3 Shenzhen Campus of Sun Yat-sen University, China |
| Pseudocode | No | The paper does not contain a clearly labeled |
| Open Source Code | Yes | The code is available at https://github.com/ADLab-Auto Drive/BEVFusion. |
| Open Datasets | Yes | We conduct comprehensive experiments on a large-scale autonomous-driving dataset for 3D detection, nu Scenes [2]. |
| Dataset Splits | Yes | We conduct comprehensive experiments on a large-scale autonomous-driving dataset for 3D detection, nu Scenes [2]. ... On the nu Scenes dataset, our simple framework shows great generalization ability. Following the same training settings [20, 59, 1], BEVFusion improves Point Pillars and Center Point by 18.4% and 7.1% in mean average precision (m AP) respectively, and achieves a superior performance of 69.2% m AP comparing to 68.9% m AP of Trans Fusion [1], which is considered as state-of-the-art. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used for experiments, such as specific GPU or CPU models. |
| Software Dependencies | No | We implement our network in Py Torch using the open-sourced MMDetection3D [8]. The paper mentions software components but does not provide specific version numbers for them. |
| Experiment Setup | Yes | We set the image size to 448 800 and the voxel size following the official settings of the Li DAR stream [20, 59, 1]. Our training consists of two stages: i) We first train the Li DAR stream and camera stream with multi-view image input and Li DAR point clouds input, respectively. Specifically, we train both streams following their Li DAR official settings in MMDetection3D [8]; ii) We then train BEVFusion for another 9 epochs that inherit weights from two trained streams. |