BEVFusion: A Simple and Robust LiDAR-Camera Fusion Framework

Authors: Tingting Liang, Hongwei Xie, Kaicheng Yu, Zhongyu Xia, Zhiwei Lin, Yongtao Wang, Tao Tang, Bing Wang, Zhi Tang

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically show that our framework surpasses the state-of-the-art methods under the normal training settings. Under the robustness training settings that simulate various Li DAR malfunctions, our framework significantly surpasses the state-of-the-art methods by 15.7% to 28.9% m AP.
Researcher Affiliation Collaboration 1 Wangxuan Institute of Computer Technology, Peking University, China 2 DAMO Academy, Alibaba Group, China 3 Shenzhen Campus of Sun Yat-sen University, China
Pseudocode No The paper does not contain a clearly labeled
Open Source Code Yes The code is available at https://github.com/ADLab-Auto Drive/BEVFusion.
Open Datasets Yes We conduct comprehensive experiments on a large-scale autonomous-driving dataset for 3D detection, nu Scenes [2].
Dataset Splits Yes We conduct comprehensive experiments on a large-scale autonomous-driving dataset for 3D detection, nu Scenes [2]. ... On the nu Scenes dataset, our simple framework shows great generalization ability. Following the same training settings [20, 59, 1], BEVFusion improves Point Pillars and Center Point by 18.4% and 7.1% in mean average precision (m AP) respectively, and achieves a superior performance of 69.2% m AP comparing to 68.9% m AP of Trans Fusion [1], which is considered as state-of-the-art.
Hardware Specification No The paper does not explicitly describe the hardware used for experiments, such as specific GPU or CPU models.
Software Dependencies No We implement our network in Py Torch using the open-sourced MMDetection3D [8]. The paper mentions software components but does not provide specific version numbers for them.
Experiment Setup Yes We set the image size to 448 800 and the voxel size following the official settings of the Li DAR stream [20, 59, 1]. Our training consists of two stages: i) We first train the Li DAR stream and camera stream with multi-view image input and Li DAR point clouds input, respectively. Specifically, we train both streams following their Li DAR official settings in MMDetection3D [8]; ii) We then train BEVFusion for another 9 epochs that inherit weights from two trained streams.